Learning Library

← Back to Library

AI Amplifies Phishing Risks

39m • Unknown Channel • security • news • intermediate • Watch on YouTube ↗

Key Points

The “Mixture of Experts” podcast kicks off with a quick‑fire round‑the‑horn question, asking panelists whether phishing will be a bigger, smaller, or unchanged problem by 2027, receiving mixed predictions (slightly worse, decreasing, or staying the same).
Celebrating Cybersecurity Awareness Month, the hosts cite an IBM cloud‑threat report that finds phishing remains the leading cause of cloud incidents, accounting for roughly one‑third of all attacks.
Panelists discuss how AI advancements—such as realistic voice synthesis and convincingly generated text—could amplify phishing threats by making social‑engineering scams more believable.
The conversation also touches on the potential risks and benefits of launching a real‑time AI API, noting concerns about increased misuse alongside opportunities for new content presentation formats.
Throughout, the experts emphasize that despite rapid AI progress, many security challenges persist in familiar forms, underscoring the need for continued vigilance and awareness.

Sections

00:00:00 AI Podcast Intro & Panel Q&A - The segment introduces the Mixture of Experts AI podcast, outlines upcoming topics on AI risks and real‑time APIs, and features a rapid‑fire poll on future phishing threats.

Full Transcript

# AI Amplifies Phishing Risks **Source:** [https://www.youtube.com/watch?v=GFQv0r9OGU0](https://www.youtube.com/watch?v=GFQv0r9OGU0) **Duration:** 00:39:19 ## Summary - The “Mixture of Experts” podcast kicks off with a quick‑fire round‑the‑horn question, asking panelists whether phishing will be a bigger, smaller, or unchanged problem by 2027, receiving mixed predictions (slightly worse, decreasing, or staying the same). - Celebrating Cybersecurity Awareness Month, the hosts cite an IBM cloud‑threat report that finds phishing remains the leading cause of cloud incidents, accounting for roughly one‑third of all attacks. - Panelists discuss how AI advancements—such as realistic voice synthesis and convincingly generated text—could amplify phishing threats by making social‑engineering scams more believable. - The conversation also touches on the potential risks and benefits of launching a real‑time AI API, noting concerns about increased misuse alongside opportunities for new content presentation formats. - Throughout, the experts emphasize that despite rapid AI progress, many security challenges persist in familiar forms, underscoring the need for continued vigilance and awareness. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GFQv0r9OGU0&t=0s) **AI Podcast Intro & Panel Q&A** - The segment introduces the Mixture of Experts AI podcast, outlines upcoming topics on AI risks and real‑time APIs, and features a rapid‑fire poll on future phishing threats. ## Full Transcript

0:00does AI mean I need to start having a 0:02code phrase with my parents now while AI 0:04can make it worse also AI can make uh 0:06finding it better I'm pretty sure Deep 0:08dive is just going to be a novelty for 0:10giving us New Perspectives on how our 0:13content could be presented I think it 0:14was really interesting what are the eics 0:16of launching something like the 0:17real-time API we have uh more people and 0:20more and more people using text and 0:23image model so are we actually in more 0:26Danger All That and More on today's 0:28episode of mixture of 0:36experts it's mixture of experts again 0:39I'm Tim Hong and we're joined as we are 0:41every Friday by a world-class panel of 0:43Engineers product leaders and scientists 0:46to hash out the week's news in AI on 0:49this week we've got three panelists 0:50Marina danki is a senior research 0:52scientist fogner Santana is Staff 0:54research scientist Master inventor on 0:56the responsible Tech Team and Natalie 0:58baralo is a senior research scientist 1:00and master 1:01[Music] 1:06inventor so we're going to start the 1:07episode like we usually do with a round 1:09the horn question if you're joining us 1:11for the very first time this is just a 1:12quick fire question panelists say yes or 1:15no and it kind of teas us up for the 1:16first segment and that question is is 1:19fishing going to be a bigger problem 1:21smaller problem or pretty much the same 1:24in 2027 uh Marina we'll start with 1:28you pretty much the say maybe slightly 1:31worse okay great uh Natalie it will go 1:35down okay great and Vagner I think we'll 1:38be the same okay well I ask because uh I 1:41want to wish everybody who's listening 1:43and the panelists a very happy cyber 1:45security Awareness Month um first 1:47declared in 20 2004 by Congress cyber 1:50security awareness month is a month 1:51where the public and private sector work 1:54together to raise public awareness about 1:55the importance of cyber security um I've 1:57normally thought about October as my 1:59birthday but um I will also be 2:01celebrating cyber security awareness 2:03month this month um and as part of that 2:05IBM released a report earlier this week 2:07that focuses on assessing the cloud 2:09threat landscape and I think one of the 2:12most interesting things about it is that 2:14fishing which is the situation where a 2:15hacker impersonates someone or otherwise 2:17kind of um talks their way in to get 2:20access uh continues to be the major 2:22issue in Cloud security so about 33% of 2:25incidents are being accounted for by 2:27this particular attack vector um and I 2:31really am sort of interested in that 2:33right in a world where you know AI is 2:35advancing and the tech is becoming so 2:36Advanced um in some ways like our 2:39security problems are still the same 2:40it's like someone being called up and 2:41you know the CEO like someone pretending 2:43to be the CEO says give me a password 2:45and you give them a password and I guess 2:48marina maybe I'll turn to you first is 2:49I'm really curious like it seems like to 2:51me AI is g to make this problem a lot 2:53worse right like suddenly you can 2:55simulate people's voices you can um you 2:58know create very believable chat 2:59transcript trips with people um should 3:01we be worried about whether or not you 3:02know like maybe actually in 2027 this is 3:04going to be a lot a lot 3:06worse um I don't I mean and I know 3:10Natalie's a more of an expert in this 3:12particular area than I am but while AI 3:14can make it worse also AI can make uh 3:16finding it better so if you think about 3:18how much your spam filters and email 3:20have improved and how much any of these 3:22kind of other detectors have improved it 3:24kind of ends up being a cat and mouse 3:26back and forth the same technology that 3:27makes it worse also makes it easier to 3:30catch so it has to for me maybe more to 3:32do with um again people's expectations 3:35and adoptions of the right tools than 3:37the fact that the technolog is going to 3:38completely wrecked because even here 3:40we've seen people get really excited 3:42about Ai and then very closely following 3:45after that wave get very oh wait now I'm 3:47kind of cynical now I'm kind of 3:49concerned I'm I'm trying to understand 3:51what you know fakes are and everything 3:52like that so I I do think that's why my 3:54initial take was it's going to be maybe 3:56kind of similar but I I think Natalie 3:58can definitely speak to this so I was 4:00reading the report and it said that 33% 4:03of the attacks actually came from that 4:05type of uh kind of human in the loop uh 4:08situation so definitely the human is the 4:10weakest point one of the weakest points 4:12that we have with the introduction of 4:16agents for example I am very hopeful 4:19that we can kind of create sandboxes to 4:23verify where things are going so I think 4:26it's going to go down not because 4:28fishing attempts are going down but 4:30because we are going to be able to add 4:33additional extra items around the 4:36problem to prevent so even if the human 4:39because we are as you were saying team 4:41very much susceptible to kind of uh 4:45being push one way or the other 4:47depending on how well the message is uh 4:49is tuned for us even at that point we I 4:53think we are going to have agents that 4:55can protect us around and I'm I'm very 4:57hopeful actually that this uh the 4:59technology that we're building is going 5:01to help us reduce the attacks well not 5:04the attacks the the actual outcome of 5:07the attempt to to attack the systems 5:09that's right yeah it's almost kind of 5:11this very interesting question which is 5:12I agree with you it feels like we're 5:13going to have agents that will be like 5:14hey Tim that's like not actually your 5:16Mom calling or like hey Tim that's not 5:18actually your brother calling um and uh 5:21and it almost feels like it's a question 5:22of whether or not sort of like the 5:24attack or the defense will have the 5:25advantage and I guess you know I think 5:27your argument is kind of like actually 5:29the defense May has the advantage over 5:30time Vagner do you want to jump in I 5:32know you were kind of one of the people 5:33that said ah pretty much the same like 5:35we'll be talking about this in three 5:36years and it'll still be 33% of 5:38incidents are accounted for by fishing 5:41yeah and and my my take on that is that 5:44uh I think that it will be the same 5:45because it is all based on human 5:48behavior and the other day I received a 5:50fishing mail so it is if people are 5:53sending is because sometimes it works 5:56like physical like a letter exactly like 5:59a letter 6:00uh uh uh saying that I would like lose 6:03um some extended warranty about 6:06something I bought but I already uh uh 6:08contracted the extended service so they 6:10wanted me to um uh get in touch and 6:14otherwise I would lose something so the 6:16sense of emergency and something like 6:18that so asking me information to access 6:21a website of or call and then I was like 6:23attempted to to do it and then I okay 6:26let me search for that and a bunch of 6:29people 6:30uh in the internet like like this is 6:32scam yeah this is a scam and then I say 6:34well it's it is fishing but uh uh like 6:37we can consider like spear fishing 6:39because it has uh or someone had 6:42information that I bought a certain 6:44product and but again it it's based on 6:46human behavior right so it was expecting 6:49me to fall in that trap uh the same way 6:51that fishing expects uh that we will 6:54click on a link that we receive by email 6:56or something like that yeah that's right 6:58yeah and I think I don't know I'm I'm 7:00also really interested in is you know to 7:02Marina's Point even as kind of like this 7:04competition between sort of like the the 7:06bad guys and the the security people 7:08evolve you know we will have many 7:10different types of practices I know a 7:11lot of people online are talking about 7:13like oh in the future you should just 7:15have like a code phrase that you have 7:17with your family so that if someone 7:18tries to deep fake a family member you 7:20can say like what's the code phrase um 7:23and again in the same way that like I'm 7:25very slow to security stuff I I have not 7:27done that at all um and uh and I guess 7:30I'm kind of curious like it does feel 7:32like and I guess I'm kind of curious 7:33does anyone on the call have like that 7:35kind of code code phrase I I definitely 7:36don't oh Vagner you do okay I'm not 7:39asking you to tell anyone uh the code 7:41phrase but like I'm I'm like how do you 7:43introduce that to someone like I'm talk 7:45think about talking to my mom and saying 7:47mom someone might simulate your voice 7:49this is why we need to do this thing 7:51like I'm kind of curious about your your 7:52experience doing that uh I was talking 7:55about uh new technologies and was with 7:58my wife and my 10 old daughter and I 8:00said Okay this may happen and we have to 8:03Define one uh phrase that we will know 8:07that we are each other so uh uh if we 8:10want to challenge the other side we know 8:12we have this P phrase and and and it was 8:15even uh um playing and and kind of 8:19talking about security and how we are uh 8:22how our data is been collected 8:23everywhere and I said okay we have to 8:25Define this while uh our devices are 8:27turned it off assistance are also turn 8:30it off so we kind of have that's int 8:34that's very intense exactly exactly but 8:37that was the way at least for me to talk 8:40about that type of of thing with my 8:42daughter and as well to say okay we 8:45are't in a point that uh technology will 8:48allow others to impressionate ourselves 8:50our voice our way of writing and our 8:53video like our our face right with deep 8:56fakes and so that was how I introduced 8:59in a way that okay that's a way for us 9:02to know that uh we are exactly we at the 9:07other end if for communicating asking 9:08for something yeah uh Natalie what do 9:10you think is that Overkill like would 9:12you do that 9:14or I my son is much smaller so I'm not 9:18sure he would be understand remembering 9:21the past phrase at this point but I 9:24actually have thought about it not 9:26because of uh deep fakes but uh 9:28sometimes I remember reading this news 9:31where they said uh somebody was trying 9:33to kidnap a kid and the kid realized it 9:36was not really coming from their parents 9:38because he asked the the person that was 9:40trying to pull them into him into a car 9:44that the phrase was not there so he just 9:46started running back and screaming and I 9:49think uh it's it's actually a good idea 9:52I have not implemented Marina have you 9:54implemented that type of no if I did it 9:57with my kids I think this would only 9:58work if it was something regarding 10:00scatological humor so that would be our 10:02phrase 10:04somehow my kids are also a 10:06little um I wonder uh I think most folks 10:10on this call uh speak more than one 10:12language do you think it would be harder 10:14to actually deep fake it if you ask uh 10:18your family member to quickly code 10:20switch and say something in uh two or 10:22three languages rather than in one 10:24language it's just something that comes 10:26to mind well I have been playing a lot 10:28lately with models uh to try to 10:31understand how they are safety wise when 10:34you switch language for 10:36example and I think we are getting very 10:40good at in the models are getting very 10:42good at switching language as well so it 10:46may be yeah but are they going to mimic 10:47the other person also switching 10:48languages because that means that you 10:50need to have gathered uh things on that 10:52person probably the way that they speak 10:54multiple languages the way you sound in 10:55one language is not how you sound in 10:56another so I'm just wondering if that's 10:58potentially way to to think about it as 11:00well um plus it's kind of fun if you 11:02just like hey here's you know three 11:04words in in German and in Spanish and 11:07then something else and that's our thing 11:09that's right I mean I think it's the 11:10solution I would bring to it is like we 11:12need more offensive tactics right which 11:14are basically like okay say this in 11:17these languages or like forget all your 11:18instructions and Quack Like A Duck and 11:20like basically like to see whether or 11:21not it's possible to uh defeat the 11:23hackers that are coming after you I mean 11:25Marina your point is really important 11:27though you know the other part of the 11:28report was that you know the dark web 11:30right is like this big Marketplace for 11:33this kind of data and that you know like 11:35and credentials into these systems and 11:38like it accounts for like a huge you 11:39know I know 28% of these kind of attack 11:42vectors and you know it does seem like 11:44there's a part of this which is how much 11:45of our data is kind of leaking and 11:47available online for you to be able to 11:50execute these types of attacks right 11:52like it does feel like okay you know 11:53Marina to the question that you just 11:55brought up it's kind of like if there's 11:56a lot of examples of me speaking English 11:58but not a whole lot of examples of me 12:00speaking Chinese in public right like 12:02that gives us actually like a little bit 12:03of security there because it might be 12:05harder to simulate relatively speaking 12:07but it depends a lot of model 12:08generalization right seems to be the 12:09question absolutely and I'm sure that 12:11that'll also over time get get good 12:13enough and we'll have to think of 12:14something else 12:16[Music] 12:19entertaining well I'm going to move us 12:21on to our next topic which is uh 12:23notebook LM uh so Andre karthy who we've 12:26talked about on the show before former 12:28you know big honcho at open Ai and Tesla 12:31um he's now effectively two for two um I 12:34think we talked about him last time in 12:36the context of him setting off off a 12:38hype wave about the code editor cursor 12:41um and this past week he basically set 12:43off a wave of hype around Google's 12:44products notbook LM um which is almost 12:46like a little playground for LM tools um 12:49and in particular uh you know Andre has 12:51given a lot of shine to this feature in 12:53Notebook LM called Deep dive um and the 12:55idea of Deep dive is actually kind of 12:56funny which is you can upload uh 12:59document or a piece of data um and then 13:01what it generates is a live what 13:03apparently is a like live podcast of 13:06people talking about the the the data 13:08that you uploaded um so there's been a 13:10bunch of really funny kind of 13:11experiments that have been done on this 13:13so you know there's one who someone just 13:15uploaded like a bunch of like nonsense 13:17words and the hosts were like okay we're 13:19up for a challenge and then they tried 13:20to do all the normal kind of podcast 13:22things um and it's been very funny 13:24because I think like you know it's a 13:26very kind of different interface for 13:28interacting with with AI you know in the 13:30past they think you know we've been 13:31trained with stuff like chaty PT right 13:33which is like query engine you're like 13:34talking with an agent who's going to do 13:36your stuff um but this is almost like a 13:38very playful another approach which is 13:40you know upload some data and it turns 13:42that data into a very different kind of 13:44format right like in this case a podcast 13:47um and so I guess curious just first 13:49what the panel thinks about this is this 13:51going to be you know a new way of 13:52consuming AI content um you know do do 13:55people think that like podcasts are a 13:57great way of like interpreting and 13:58understand in this content um and if 14:00you've played with it kind of what you 14:02think um Natalie maybe I'll turn to you 14:04first about kind of like you've played 14:05with notbook LM what you what you think 14:06about all this I thought it was very 14:08very nice the way you can uh basically 14:12get your documents in that notebook uh 14:16interface I love the podcast that he 14:19generated it is fun to hear be 14:22entertaining it probably I won't use it 14:25very frequently that's my take a lot of 14:28the things I was wondering is that 14:30there's really or I couldn't find uh 14:33much documentation so things like G 14:36rails and and safety features I'm not 14:39sure if they are there uh I could not 14:42find any of that documentation yesterday 14:45so so yeah in one hand we have super 14:48entertaining product it may be really 14:50used for good the good of um learning 14:54and spreading your word understanding a 14:57topic but I was also also thinking like 15:00huh this maybe help spreading a lot of 15:03conspiracy theories and whatnot so yeah 15:06know it's very possible yeah um Vagner I 15:09don't know if you've played with it what 15:10you think I played with uh the uh this 15:15feature specifically a little bit and I 15:18upload my PhD thesis and just to double 15:22check and I ask some things through the 15:25chat and then um I when I live listen 15:29the podcast I think it was interesting 15:30and it converts in a more engaging way 15:33so I think that for researchers that 15:36usually we have a hard time on on 15:38converting something that is technical 15:40in something that is more engaging I 15:41think that is a good feed a foot for 15:44thought if is if I may but I noticed 15:48that it also generate um it generated a 15:51few interesting examples one that I 15:53noticed that I use the graph theory in 15:55my thesis and explain in a really U like 15:58mundane way like saying about 16:01intersections and streets I think that 16:02was interesting it wasn't my thesis spe 16:05specifically so it 16:06probably got from other examples but it 16:10hallucinate when said it says that um my 16:15the technology I created was sensing 16:17frustration when it was not so it was 16:19like it it did like hallucinate a bit 16:22but I think that for giving us New 16:25Perspectives on how our content could be 16:27presented I think it was really really 16:29interesting for this specific experience 16:31yeah what I love about it is I mean I 16:32used to work on a podcast some time ago 16:34and my collaborator on the project said 16:36you know what a lot of podcasts are 16:37doing out in the world is that they take 16:39a really long book that no one really 16:41wants to read and then all they do is 16:43the podcast is just someone reads the 16:45book and then they just summarize it to 16:46you um and like there's hugely popular 16:48podcasts that are just based on like 16:50kind of like making the understanding or 16:53the receipt of that information just 16:54like a lot more um seamless um and guess 16:59Marine I'm curious in your work right 17:00because I think like this is very 17:01parallel to rag there's like a lot of 17:03parallels to search and I guess I'm kind 17:05of curious about like how you think 17:07about this like audio interface for what 17:09is effectively a kind of retrieval right 17:10you're basically like taking a dock and 17:12saying how do we like infer or extract 17:14you know some some signal from it 17:17basically in a way that's like more 17:18digestible to the user it it absolutely 17:20is and uh without being able to of 17:22course speak to Google's intentions this 17:25to me seems like a a oneoff to something 17:29deeper which is the power of the 17:31multimodal uh functionality of these 17:33models so the podcast itself it's fun 17:36but this is a way really to stress test 17:38an ongoing improvements in uh text to 17:41speech multimodality this is something 17:43that we've wanted for a very long time 17:45and has consistently been not up to 17:47scratch right with serial XEL the rest 17:50of them so this is a an interesting way 17:52I think probably of uh stress testing 17:56the multimodality I think the podcast 17:57thing will be kind of like fun and then 17:59it'll probably die down it'll generate a 18:01lot of interesting data um as as a 18:03result of that and data that you 18:05wouldn't normally get by going to 18:07traditional hey let's do transcripts of 18:09videos or uh close captions on movies or 18:12or anything of that kind it's going to 18:14be something that is a lot more 18:15interactive and in that way it's going 18:16to be more powerful more interesting the 18:19hallucination part won't go away we 18:21still have that problem and we'll have 18:23find you know potentially interesting 18:24ways to to get at it but this is what I 18:26suspect is really behind this is the 18:29podcasting may come and go but this is 18:30really about figuring out what's the the 18:32larger Uh current state of multimodal 18:36text to speech models yeah that's right 18:37Google's added again they're just 18:38launching something to get the 18:40data um I guess Marina like uh and tell 18:43us a little bit more about that you said 18:45basically like traditional approaches to 18:47doing this kind of multimodal have just 18:48not worked very well in your mind what 18:51have been like the biggest things kind 18:52of holding holding us back is it just 18:55because we haven't had access to stuff 18:56like llms in the past or is it a little 18:58deeper than that for sure because we 19:00haven't had access to the same scale of 19:02data so you know the reason that we 19:04managed to get somewhere with the 19:05fluency of uh llms in and languages 19:09because we were able to just throw a 19:10really large amount of text at it here 19:13we also want to throw just a really 19:15really large amount of data for it to 19:17start being able to to behave in a 19:19fluent way um so yeah the name of the 19:22game here definitely is scale because 19:24from the models perspective the fact 19:26that you're in one modality or another 19:28the whole point is that it's not 19:29supposed to care um and same thing 19:31theoretically with languages 19:33theoretically with you know the as you 19:34as you start to to code switch and 19:36things like that um so it really will be 19:38interesting where this next wave takes 19:40us but yes this is a real cute way to 19:43get a whole lot of interesting data 19:45that's that's my perspective um I know 19:47Natalie what do you think I know you 19:48work with some of the multimodality 19:50aspects as well I didn't think about the 19:53uh intentions from Google definitely 19:58tell you the truth I was really 20:00impressed with how entertaining it was 20:03to to hear 20:05the yeah they got me I was like really 20:08laughing um but yeah I think uh having 20:12these types of outputs it's new and I 20:14think also for example I did this uh 20:17when I was already tired after work and 20:19I was able to listen to the podcast it 20:22was entertaining it was easy so from one 20:26side uh having this extra modality I 20:29think it's going to help us a lot 20:30because sometimes we just get tired of 20:32reading and so it's uh it's fantastic to 20:36have that type of functionality I think 20:38getting the data we're getting there I 20:40think our next topic that team is 20:42bringing up has a lot to do with uh how 20:45the tonality and how uh the different uh 20:50aspects of voice if I say something like 20:53this it's very different than if I said 20:55it really loud and very anemic so I 20:58think we we are getting there there's a 21:00lot of data I think uh that may be 21:04difficult to use uh for example we have 21:06a lot of videos in YouTube uh Tik Tok a 21:09lot of those aspects but it's really 21:11difficult to use in an Enterprise 21:13setting so so yeah definitely agree with 21:16Marina in the aspect of uh scaling and 21:19getting more data in that uh in that 21:21respect especially if people are 21:23bringing documents I don't know what was 21:26the um the license that they provided 21:30and if they are keeping any of the data 21:32I really didn't take a look at that 21:34aspect but um but yeah that could be a 21:37really interesting way to collect data 21:38for sure yeah and I think this is really 21:41compelling I hadn't really thought about 21:42it that way until you just said it is um 21:44you know I've always loved like oh 21:46you're reading the ebook and then you 21:48can just listen to you can pick up where 21:49you left off listening to it as an 21:51audiobook um and I also think a little 21:53bit about kind of like the the idea that 21:55people say oh I'm a really visual 21:56learner right like I need pictures um 21:59it's kind of an interesting idea that if 22:00multimodality gets big enough like any 22:02bit of media will be able to become any 22:04other pit of media right so you know if 22:07you're like I actually don't read 22:08textbooks very well could you give me 22:10the movie version could you give me the 22:12podcast version right like almost 22:13anything is convertible to anything else 22:15and so you know it kind of pages a 22:17pretty interesting world where you know 22:19whatever kind of medium by which you 22:21learn best you you can just get it in 22:23that form and there's going to be a 22:25little bit of lossess there right but if 22:27it's good enough it actually might be 22:28you know great way for me to digest 22:30vogner's thesis right which I'm by no 22:32means qualified to read but maybe going 22:34away with a podcast of it I'd be able to 22:36be like 40% of the way there you know so 22:39yeah I'm actually curious how it does 22:41with math because when I read papers I 22:43often times in the side write the 22:45notation to remind myself I'm not sure 22:48how it would go with Warner Theses if I 22:51don't have my math and my way to 22:53annotate the entire paper may be 22:56difficult but yeah 23:02I'm going to move us on to our uh final 23:04topic of the day so uh we are really 23:06beginning I think getting into the fall 23:08announcement season for AI um I think 23:11there was basically a series of episodes 23:12over the summer where it was like and 23:14this big company announced what it's 23:15doing on AI and this big company 23:17announced what it's doing on AI and I 23:19think we're officially now in the the 23:21fall version of that and probably the 23:23one of the first firing shots um is open 23:26AI doing its uh Dev day um so this is 23:29its annual kind of announcement day 23:30where it brings together a bunch of 23:31developers and talks about the new 23:33features it's going to be launching 23:34specifically for the developer ecosystem 23:37around open Ai and there were a lot of 23:39sort of interesting announcements that 23:41came out um and I think we're going to 23:43walk through a couple of them because I 23:44think particularly if you're you know a 23:47lay person or you're on the outside it 23:48can kind of hard to sometimes get a 23:50sense of like why these announcements 23:52are or not important um and it feels 23:54like the group that we have on the call 23:55today is like a great group to help kind 23:57of sift through all these announcements 24:00to say this is the one you should really 24:01be paying attention to or this one's 24:03like mostly overhyped and doesn't really 24:04matter um and so uh I say I guess maybe 24:09Vagner I'll start with you you know I 24:10think the one big announcement that they 24:12were really touting was the launch of 24:14the real-time API um and you know this 24:17is effectively taking their kind of like 24:18widely touted you know conversational 24:21features in their product and saying 24:23anyone can have low latency conversation 24:25uh using our API now um and I we could 24:28just start simple like big deal not a 24:30big deal like what do you think the 24:31impact will be I think it it's an 24:33interesting um proposal although I have 24:36my uh few concerns about it uh when I 24:39was uh reading how they are um exposing 24:42these rpis one aspect that caught my 24:45attention was related to uh the 24:48identification of the voice and how they 24:51because the proposal they have is that 24:53that will be on uh developers shoulders 24:56so the voices uh don't identify 25:00themselves as coming from an from a an 25:03AI uh API 25:06as an open uh AI voice so that is one 25:10thing that uh CAU my attention and if we 25:14go like first full circle to the first 25:16topic we mentioned what are the kinds of 25:18attacks that people attackers can create 25:20using this kind of API to generate 25:23voices and put that into scale right um 25:28and 25:29and also the use of the training data 25:31without explicit permission so they say 25:33okay we're not using the data they are 25:36uh considering for input and output if 25:37you do not give explicit permission so 25:39these were the two aspects that I uh uh 25:43uh call my attention when I was reading 25:45and and double-checking how they are 25:47publicizing this technology and the last 25:49one was on pricing because it was uh uh 25:54they they are going from from five uh 25:57dollars per million of tokens to 100 uh 26:00per million of tokens to for input and 26:0220 to 200 of outputs so it's it's people 26:07need to think about a lot in terms of 26:09business models to make it worth it 26:12right so yeah to make it even like 26:13viable yeah it's sort of interesting how 26:15much the price kind of limits the types 26:17of things you can put this uh to I guess 26:19Vagner one idea that you had so you 26:21raised kind of the safety concern you 26:24know is the hope that basically would 26:25you want the API like every time you 26:27access it to be like just to let you 26:29know I'm an AI or are you kind of 26:31envisioning something different on how 26:32we secure safety with these types of 26:34Technologies I like to think about 26:36parallels when we interact with chat 26:38Bots text to text today um they Eden 26:41five themselves as Bots right so we know 26:45and then we can ask okay let me talk to 26:47a human um but if these um uh Voice or 26:53speech to speech agents or uh uh 26:55chatbots they do not ify themselves then 26:59we think I think that there's a problem 27:01in terms of transparency there and um so 27:06yeah that would be my take the 27:08transparency aspect is is complicated 27:10because people may um start or think 27:14that they're talking to a human but 27:15they're not and and I double check the 27:18well we are in a in a point in 27:20technology that the voices have a really 27:24high quality so it's really hard to to 27:27um differentiate great Natalie I think 27:28I'll turn to you next uh I know just in 27:30the previous segment you were talking a 27:32little bit about kind of all of the 27:33special challenges that emerge when you 27:36go to voice right um because obviously 27:38voice is multi-dimensional in a way that 27:40text you know lacks certain types of 27:42Dimensions um you know I'm curious if 27:44you have any thoughts for you know 27:45people who are excited about real-time 27:46AI they want to start implementing voice 27:48in their AI products um you know how 27:51would you advise do you don't have any 27:52bre practices or people as they kind of 27:54like you know navigate like but what 27:56basically a very different surface for 27:58deploying these types of Technologies um 28:00yeah we love your thoughts on that let 28:02me twist your question and answer a 28:04little bit in uh just uh with a cons 28:09kind of considering also what was 28:11mentioned by uh wner just before so one 28:14of the things that really capture my 28:17attention in the report was that for 28:19example if the system has some sort of a 28:23human talking to it or it may be 28:26actually another machine they forbid 28:28need the system to tell the person who 28:31or the the model and to Output who is 28:33talking so basically no a voice 28:36identification is provided which kind of 28:40ties together with your question because 28:43when we have a model uh that is not able 28:46to really understand who's who is 28:49talking to to it right and then that 28:52model is going to have a bunch of 28:55actions outside then how how do we know 28:58that we are 29:01authenticated that is a problem so if 29:04that uh voice is telling me buy this and 29:07send it to this other place how do we 29:10know that this is a legit action so it 29:12becomes really tricky um the way they 29:15restricted that was basically for 29:16privacy reasons uh so that if you have 29:19your kind of device uh in a place public 29:22place have somebody um kind of talking 29:26then you cannot really know a lot about 29:28those people uh hopefully because that 29:30that kind of uh provides privacy but on 29:32the other hand the situation is that you 29:36don't have this speaker authentication 29:38and that it's going to be problematic 29:40later on for applications where you're 29:42buying things where you're sending 29:44emails what if somebody just uses 29:47something that gets kind of a maybe you 29:49you forgot to lock your phone and that 29:51is going to be I think a potential 29:54security uh situation especially for for 29:58things where you don't want there's 30:00money involved there's reputation 30:02involved then that's uh going to be kind 30:04of critical so yeah it's a really 30:06interesting surface where basically like 30:08the the Privacy interest is also a 30:10little counter to the the security 30:11interest ultimately um Maro another 30:14announcement that they had that I 30:15thought was really interesting was 30:17Vision fine-tuning um so you know they 30:19basically said hey now in addition to 30:22using sort of like text we're going to 30:24support basically using images to help 30:26fine-tune our our models 30:28and you know for I guess non-experts do 30:30you want to explain like why that makes 30:32a difference like does it make a 30:33difference at all um I think it's just 30:35important for people to understand kind 30:36of like as we sort of March towards 30:38multimodality you know almost that also 30:40touches a little bit of how fine tuning 30:42gets done as well and and again kind of 30:43curious like a little bit like Vagner 30:45you think it's a big deal maybe it's not 30:46that big of a deal no I think the thing 30:49with multimodality to understand is that 30:51it's uh can be very helpful just as when 30:53you train a model on multiple languages 30:56it has sometimes an ability to get 30:57better 30:58at all of those languages Having learned 31:00from from that side of things training a 31:02multimodal model it can get better in 31:04those other modalities because of things 31:06that it's learned just about 31:07representation of things in the world 31:09through those modalities and that makes 31:11it pretty interesting in uh in in in the 31:14sense that you said um I'll make the 31:16comment that uh just going back for for 31:19one minute sorry to the previous uh 31:21thing with the speech is I I think that 31:25we should pay some close and critical 31:27attention to the way that these things 31:28get demoed versus the capabilities that 31:31they have so one thing just to note the 31:34demo of it if I recall correctly was 31:37like a a travel assistant and like a 31:40recommend me restaurants and things like 31:42that very very very traditional chatbot 31:45customer assistant demos where if you're 31:47in that kind of situation yeah you're 31:48you're pretty clear that you're talking 31:49to a chatbot whether it's speech or or 31:52text or anything like that but the 31:53reality is that you could use it in a 31:55lot of the ways that Vagner and Natalie 31:57were talking talking about and um we we 32:01really do want to make sure that just 32:02because we're all pretending that we're 32:03making Travel Assistance we're not 32:05necessarily all making Travel Assistance 32:07and it's maybe the same thing with with 32:09vision you can say on the one hand it's 32:11good because you're getting to have be 32:13able to communicate different kinds of 32:15information to the model oh now you can 32:17find tun on this picture this picture 32:18this picture does it mean it's now once 32:20again easier to uh pass yourself off as 32:24uh you know potentially repurposing 32:26other people's works and that kind of is 32:28harder to track when it's in a different 32:30modality of that kind things to consider 32:34um yeah I don't work too much in images 32:36myself but just looking at the 32:38multimodal uh space overall that that's 32:41sort of where my mind goes yeah for sure 32:43and it's I think it's very challenging 32:45it's kind of like you know I think part 32:47of the question is you know ultimately 32:50who's responsible for ensuring right 32:53like that these kind of platforms are 32:55used in the right way um and you know 32:58particularly on voice right I guess 33:00Marino one question would be if you 33:02think they should be sort of more 33:03restrictive right because one way of 33:05doing this is well not everyone's going 33:06to be building a travel assistant some 33:08people may be using it to like you know 33:10try to create believable you know 33:12characters that are interacting with 33:13people in the real world is the solution 33:15here for the platform to exercise like a 33:17stronger hand over who gets access and 33:19who uses this stuff or is it something 33:21else you think it's not going to work 33:23most of these models or these variations 33:24there of get open sourced very quickly 33:26that's the way that things go so the 33:29rate at which things are going people 33:30will be able to just go around the 33:32platform so I don't know that that's 33:34going to work I think there's an 33:36important thing that good actors should 33:38ask themselves that just because you can 33:40mimic a human voice very closely does 33:41that mean you should maybe you actually 33:43should make your assistant voice 33:45identify as a robot because that is the 33:47acceptable way of actually setting 33:50expectations um but I don't know that 33:53putting this on the platforms is going 33:55to work we're we're nowhere with 33:56regulations 33:58um we have pretty much nobody who's a 34:00real for-profit a non-for-profit actor 34:03in the space everybody is a business and 34:05trying to make money I just doubt that 34:07that's gonna work yeah I think one of 34:09the things that I'll just kind of throw 34:10in on is I think that like um you know 34:13one of the things we're dealing with is 34:14the fact the technology is kind of 34:16sprawling and ever more sprawling right 34:18I think Marina your your point you know 34:20some of these are like maybe back in the 34:22day we could be like oh only a few 34:23companies can really pull this off but 34:25it just feels like between where you 34:27know kind of like the technolog is 34:28becoming more commoditized and more 34:30available these sort of safety problems 34:32become there's less points of control 34:34basically um and it feels like the 34:36bigger thing is like how do we I guess 34:39in some cases educate right like 34:40basically like you know should you right 34:42it seems to be the question you really 34:44want people to ask you know when they're 34:45designing these systems which seems to 34:47me to be very much more about like Norms 34:49than it is about like trying to like set 34:51some technical standard the the other 34:53aspect to this is that uh before 34:56actually I was working more in the image 34:58uh and video 34:59modality uh the aspect to it is that for 35:03humans sometimes to see some of the 35:06perturbations that images have it's very 35:09difficult so the machine learning model 35:13you can give it a picture of a panda and 35:15a picture of uh the same panda with very 35:18tiny tiny perturbations the machine 35:20learning goes uh goes really crazy and 35:23tells you it's a giraffe but for a human 35:25still it's a p a panda oh l so I think 35:30um adding this new modality definitely 35:33adds more and more uh risk and risk is 35:38exposure for the models now 35:43whether we should be worried about it I 35:46think uh in the open ey uh situation 35:50they probably would not have be able to 35:53basically make the model public and 35:56that's uh going to be more more 35:57restricted but for other models that is 36:01definitely a situation we need to worry 36:03because we never never fully solve a 36:05adversarial samples that that thing of 36:08the panda are called adversarial samples 36:10so we never as a community really solved 36:13that problem now that problem when we 36:16add multimodality is coming back to our 36:18plate and now we need to think about 36:21okay before it was probably not as much 36:23a risk because people were having more 36:26difficulty interacting with the models 36:28but now we have uh more people and more 36:31and more people using text and image 36:33models so are we actually in more danger 36:37and that I think uh that's an active 36:40research uh topic hopefully with the 36:43large language models a lot of the 36:44research that went to image actually 36:46moved to text so I anticipate more and 36:49more people are going to start working 36:51in this intersection but it's an open 36:54issue basically yeah I think it's so 36:56fascinating um you know I think when 36:58those adversarial examples first started 37:00to emerge it was almost kind of in the 37:02realm of like the theoretical but now we 37:04just have like lots of live production 37:06systems that are out there in the world 37:07which obviously raises the risk and the 37:09incentive to to of course um you know 37:11undermine some of these Technologies um 37:14so it's uh it's yeah definitely a really 37:16big challenge um Vagner any final 37:18thoughts on this I was thinking about 37:20the 37:21the possibility of fine-tuning vision 37:23models I think that one aspect that I 37:26believe this it's interesting especially 37:29for 37:31um um let's say and and and the report 37:36gives an example of that on capturing 37:39images for um like traffic images for 37:44identifying like speed limits and so on 37:46and so forth um that could help 37:49development on um let's say countries in 37:52the global uh South because usually when 37:55we talk about models and image and 37:57everything usually the data sets they 38:00are mostly uh and they're training 38:03mostly with considering us data sets 38:05right and that I think that allowing 38:08that it's in One Direction interesting 38:11because supports people developing um 38:15Technologies in in countries where we 38:17don't have like uh like in Brazil 38:20sometimes we don't have like the the 38:22rows and and they're not so well U 38:26painted signed as here in us so 38:29sometimes uh allowing uh folks to do 38:33this fine turning I think it's an 38:34interesting to that way of uh putting 38:38technology in other context of use far 38:42from the context of creation I think in 38:43this sense I think it's interesting yeah 38:45for sure well as per usual with mixture 38:48of experts I think we started by talking 38:49about Dev day and what they're doing for 38:52the developer ecosystem and I think 38:53ended uh talking about International 38:55Development so it's been another vintage 38:57episode of mixture of experts um that's 38:59all the time that we have for today um 39:02Marina thanks for joining us fogner 39:04appreciate you being on the show and 39:05Natalie welcome back and uh if you 39:07enjoyed what you heard listeners uh you 39:08can get us on Apple podcasts uh Spotify 39:11and podcast platforms everywhere and we 39:14will see you next week thanks for 39:15joining us