Learning Library

← Back to Library

Meta AI Ethics Policy Leak

Key Points

  • A leaked Meta AI ethics policy, signed off by over 200 staff including the chief AI ethicist, contains disturbing provisions such as permitting romantic conversations with children, partial compliance with NSFW deep‑fakes, and support for racist or threatening content.
  • Meta argues the document isn’t representative of typical use cases, but critics say it shows the company is tacking on superficial guardrails rather than embedding robust, technical ethics into its AI systems.
  • The company has refused to publish a “fixed” version of the policy, avoiding public scrutiny and continuing a pattern of opaque leaks that prioritize engagement metrics over safety.
  • Earlier reports revealed Meta’s plan to create AI‑generated “friend” profiles that would post content and forge artificial relationships on Facebook and Instagram, further illustrating their focus on maximizing user engagement.
  • The speaker aims to move beyond merely relaying the news and will discuss how to genuinely engineer AI ethics into system design.

Full Transcript

# Meta AI Ethics Policy Leak **Source:** [https://www.youtube.com/watch?v=tVTOs24Yb7E](https://www.youtube.com/watch?v=tVTOs24Yb7E) **Duration:** 00:13:29 ## Summary - A leaked Meta AI ethics policy, signed off by over 200 staff including the chief AI ethicist, contains disturbing provisions such as permitting romantic conversations with children, partial compliance with NSFW deep‑fakes, and support for racist or threatening content. - Meta argues the document isn’t representative of typical use cases, but critics say it shows the company is tacking on superficial guardrails rather than embedding robust, technical ethics into its AI systems. - The company has refused to publish a “fixed” version of the policy, avoiding public scrutiny and continuing a pattern of opaque leaks that prioritize engagement metrics over safety. - Earlier reports revealed Meta’s plan to create AI‑generated “friend” profiles that would post content and forge artificial relationships on Facebook and Instagram, further illustrating their focus on maximizing user engagement. - The speaker aims to move beyond merely relaying the news and will discuss how to genuinely engineer AI ethics into system design. ## Sections - [00:00:00](https://www.youtube.com/watch?v=tVTOs24Yb7E&t=0s) **Meta's Leaked AI Ethics Policy** - A leaked Meta document reveals permissive, controversial guard‑rails for AI—including allowances for inappropriate content—prompting criticism that the company's ethics policy is superficial and ethically troubling. - [00:03:24](https://www.youtube.com/watch?v=tVTOs24Yb7E&t=204s) **Constitutional AI Self‑Critique Process** - The passage outlines Anthropic’s training loop in which a model generates a response, critiques it against a set of constitutional principles, revises the output accordingly, and thus cultivates an ethical intuition rather than merely following static rules. - [00:06:47](https://www.youtube.com/watch?v=tVTOs24Yb7E&t=407s) **Ethical Challenges in RLHF Feedback** - The speaker critiques reinforcement learning with human feedback, emphasizing biased raters, Meta’s exclusion of child‑development experts, and the resulting ethical fatigue. - [00:10:58](https://www.youtube.com/watch?v=tVTOs24Yb7E&t=658s) **Transparency and Ethics in AI Procurement** - The speaker emphasizes the necessity for model providers to openly disclose ethical guidelines and synthetic‑data practices so buyers can assess risk and liability when selecting AI systems. ## Full Transcript
0:00Meta has an ethics scandal on their 0:02hands. They have had a document leaked 0:05which was approved by over 200 people 0:07including engineers, including ethsists, 0:10including Meta's chief AI ethicist and 0:12the content is an AI ethics policy that 0:16is deeply troubling. Now, Meta 0:18emphasizes that this is not 0:19representative of the common or typical 0:22use case and they're trying to draw 0:23guard rails. I get that. The challenge 0:26is technical. I think that Meta's AI 0:29ethics policy doesn't actually reflect a 0:33deeply technical approach to doing 0:36ethics properly at the core of 0:38artificial intelligence systems and 0:40instead reflects an attempt to bolt on 0:43some minimal ethical guard rails after 0:45the fact. And I'm going to get into what 0:47I mean by that and what deep AI ethics 0:49means later in this video. But first, if 0:52you haven't been reading the news, just 0:54a little teaser of what was in the 0:55leaked document. Reuters has leaked the 0:57document. They don't they haven't leaked 0:58the full document. They've summarized 1:00it. Uh and Meta has admitted it's real. 1:02They talk about and and Reuters 1:04explicitly talks about the idea that the 1:06AI would be permitted to have some kind 1:08of romantic conversation with a child. 1:11They talk about the idea that the AI 1:12would be permitted to partially comply 1:14with requests for not safe for work deep 1:17fake images. They talk about the extent 1:20to which the model would comply with a 1:22request to create a an image about 1:24threatening an elderly person or a 1:26child. I could go on, right? There's 1:27there's content about how it can support 1:29creating false information, false 1:31medical information about celebrities, 1:33content about how the AI would be 1:35permitted to support a racist argument. 1:38There's a lot of stuff that is 1:40repugnant. Really, that's where Meta 1:42stops, right? Meta comes back and says, 1:43"Oh, well, this was a mistake. I I've 1:45I've worked at a big company. If 200 1:47people approved it, if the chief AI 1:48ethicist approved it, it's not a 1:50mistake, guys. That's just not how big 1:51companies work. It was deliberate and 1:53they're refusing to release what they 1:55call the fixed document. Again, they're 1:57avoiding the sunlight here. And I think 1:59that's part of the problem, especially 2:01when you have a documented pattern of 2:03leaks from a company that tend to 2:05emphasize the same behavioral focus, 2:07which is to optimize for engagement with 2:10their systems. Just earlier this year, 2:12Meta was reported to have been working 2:14on AI profiles for artificial people who 2:18would post content and then develop 2:21friendships with you and so on. 2:22Essentially, act like Facebook friends, 2:25act like Instagram friends in the 2:27network. We all know AI content creation 2:30is going like gang busters, but that was 2:32a new level. Essentially, Meta starting 2:34to create this sort of artificial 2:35network of friendships around you. So, 2:37this is very much in line with Meta's 2:39overall approach. That's what happened. 2:41I want to talk about AI ethics and how 2:44you engineer for it because I don't just 2:46want to report sort of the news and what 2:47happened. You can get that anywhere. I 2:49want to talk about the engineering 2:50piece. And I think I want to use the 2:53anthropic approach as a lens. Not 2:56because anthropic has gotten it right 2:58and perfect. I would argue there is no 3:00right perfect solution here. But because 3:03anthropic's approach emphasizes the idea 3:06that ethics is an engineered capability, 3:09not a set of rules. So Anthropic's 3:11approach is to build ethics in at 3:14training rather than bolting it on 3:16after. And I think that would have 3:17prevented or addressed a lot of what 3:19meta seems to be struggling with here. 3:21And so the constitutional uh practice or 3:24process that anthropic has published and 3:26talked about very widely is that the 3:28model will generate a response in 3:31training. It will then learn to critique 3:33its own response based on a set of 3:36constitutional principles that it's been 3:38given. So it revises based on its 3:40critique, learns from the critique and 3:42the revision. So as an example, the 3:45model will generate potentially harmful 3:47content. It will then recognize the harm 3:50by referring to its constitutional 3:52principles. It will then revise to 3:54refuse or redirect. And the whole 3:56process of training reinforces this 3:58pattern. So the model learns to go back 4:01to it. This creates a kind of ethical 4:03intuition. It's not just rule following. 4:06It's learning to go back to 4:07constitutional principles. Which is why 4:08Anthropic calls this constitutional AI. 4:11And it's why they believe it's important 4:13in an age when models reason more and 4:15more. As you get models that reason, you 4:17need to have models that can reason 4:19within a sense of an ethical framework. 4:21or else there are going to be more and 4:23more ways to convince the model to 4:25reason its way in a direction that could 4:27be potentially harmful to the user or 4:29the community at large. So the idea at 4:32least is that the model will learn why 4:34something is harmful and not just that 4:36it is harmful. And that will especially 4:38as reasoning models get smarter give you 4:41a wider surface area for protecting the 4:43user in the community because the model 4:45understands and internalizes deeply the 4:47rationale for what is going on in the 4:50response. that enables the model to 4:52hopefully recognize novel harmful 4:54patterns that it has not seen before. 4:58So, who who writes the constitution? 5:00This gets at one of the challenges. I 5:01told you there was no perfect way. One 5:03of the challenges with this approach is 5:05that it's unclear who gets to write the 5:09constitution. And right now, it's 5:10private companies because they're the 5:11model makers, right? And Anthropic's 5:14public version of their constitution is 5:17somewhat vague. I don't know if they 5:18have a private one that's more more 5:20durable, more specific, that's 5:22proprietary, but but their public one 5:23has statements like be helpful and 5:25harmless. I mean, it reminds me of 5:27Hitchhiker's Guide and the description 5:29of Earth is mostly harmless. It's not 5:31super useful, is it? The question then 5:33arises, if you have a useful 5:35constitution, if it's specific, if it's 5:38not vague, how do you handle conflicts 5:40between principles? How do you balance 5:42helpfulness and harmlessness? How do you 5:44balance honesty and kindness? The model 5:46needs to learn to navigate tensions 5:49between values, not just a set of rules 5:51to follow. And that in a sense mirrors 5:54what we do as people when we develop 5:55ethically. We learn about wrestling with 5:57conflicting values and what it means. 5:59And this underlines one of the things I 6:01tend to sort of emphasize when I get 6:02asked about AI ethics. It's not a 6:04practice of writing in the ivory tower 6:06when it comes to AI. It's really a 6:08practice of engineering. And how do you 6:12engineer the kind of ethical development 6:15that you would want to see? And I think 6:17part of why I want to cover anthropics 6:19use case here in detail is they have 6:22actually pretty publicly talked about 6:24the importance of engineering ethics. 6:25And I think that represents at least a 6:27good mile marker along the way as we 6:29develop AI systems that increasingly 6:31impact users and communities. So the 6:34obvious question which maybe you're 6:35waiting to for me to ask or maybe you're 6:38going to roll your eyes at is whose 6:39values and which ethical framework, 6:41right? Who gets to pick? And we'll get 6:43into sort of how you might address that. 6:45But there there there are some answers 6:47that we can actually articulate to that 6:49that are I think publicly reasonable to 6:51the community. Let's start with the idea 6:53that a lot of the way feedback training 6:55works is through reinforcement learning 6:57with human feedback. Humans will rate 6:59outputs and models will learn to get 7:00higher ratings. Now, we are starting to 7:02get to a point where models will 7:03self-learn and models will self-rate 7:04outputs. That is fundamentally an 7:07outgrowth of RLHF and it's an outgrowth 7:09based on the scale of the models we're 7:11addressing now. But if you start with 7:12the idea that humans rate feedback and 7:14that might be especially important in 7:16the case of ethics, meta's failure 7:19highlights a flaw. Which humans get to 7:22highlight feedback and training? It's 7:24kind of the same question as which 7:26humans get to write the values because 7:27the feedback informs the values. It 7:29informs how you navigate the tension 7:32between these different value statements 7:34like honesty and kindness etc. In this 7:36case, Meta seems to have passed their 7:39guidelines through lawyers, engineers, 7:41ethicists. But as far as I can tell and 7:44as far as I've seen in reporting, there 7:45were no child development experts 7:47involved even though children were 7:48explicitly addressed and considered. 7:50That's sort of like training a medical 7:51AI without doctors, guys. And even if 7:54they had the right people in the room, 7:56one of the things to call out is that 7:58there is a sense of fatigue that can set 8:01in when you're dealing with use case 8:03after use case. There can be fatigue 8:05when you're dealing with edge case after 8:06edge case at a policy level, which that 8:08document did. There's also a degree of 8:10fatigue that's very well documented with 8:12human reviewers who are looking at 8:14potentially harmful content all the 8:16time. You can get reviewer fatigue and 8:18standards can drift during the day. And 8:20so one of the things that I want to call 8:22out is that we do a better job here if 8:26we can get an agreed set of 8:28stakeholders, an agreed set of 8:31constitutional principles. You see how 8:32you can start to point a way towards 8:34something that becomes a framework for 8:35ethics for the industry. You can have 8:37like an agreed set of common core 8:39constitutional principles that AI should 8:41follow and should be engineered into AI 8:43systems. You can have an agreed set of 8:45stakeholders who should review ethics at 8:48private companies. That would be a 8:49common core as well. You could have an 8:52agreed set of working standards for 8:54human reviewers, especially around 8:55ethical matters so they're not over 8:57fatigued and over tired. These are 8:58things that sort of fall out naturally 9:00as we start to understand how ethics 9:01works. This this would be essentially 9:04the basis for an agreed companywide or 9:08industrywide 9:10set of standards for how we train AI so 9:13it's helpful to the community. Red 9:15teaming is another issue. Red teameing 9:17means trying to break your system before 9:19deployment. If there had been red 9:21teaming with child safety experts, I 9:24don't think this would ever have 9:25happened because they would have 9:26immediately tagged this as an issue. 9:28Good red teaming needs people who 9:31understand how harm is actually 9:34practiced with AI and it needs response 9:37mechanisms that incorporate that 9:39feedback through reinforcement learning 9:42into the sense of ethics that the AI 9:46system needs to learn. Hey, we learned 9:48that this was an attack vector that 9:50works. How do we start to balance our 9:52values differently as a result? Last but 9:54not least, I want to talk about 9:56synthetic data. You obviously have 9:58situations here where you cannot train 10:00on real data because it's dangerous to 10:02the community. So you have to train on 10:04synthetic data that simulates 10:06inappropriate content. And in 10:08particular, the constitutional AI 10:10example from anthropic suggests that you 10:12should train on data that simulates a 10:16refusal in a situation where 10:19inappropriate content or inappropriate 10:21data is requested from the model. And I 10:23think part of where we see the issue 10:25with meta is they're focused a lot on 10:27shutting the door of the barn after the 10:29cow got it, right? They're focused on 10:30these edge cases when the model itself 10:33doesn't have the instincts to not 10:35produce them. And so what Meta is trying 10:37to do is just to maybe trim off the 10:39edges of egregious harm a little bit, 10:41but then they're normalizing a lot of 10:43behavior that the community would widely 10:44consider unacceptable. We need to get to 10:46a point where that common core of ethics 10:49that we engineer as a capability into AI 10:52systems is widely understood that we can 10:54all talk about it. We can all debate it. 10:56We all understand which stakeholders are 10:58involved. And if we generate synthetic 11:00data, we're generating synthetic data in 11:02line with those values in line with what 11:05we want the AI to learn and do. In fact, 11:08this would be a case where a synthetic 11:10data set that was widely available that 11:12could be tested against for new models 11:14would be really appropriate and helpful 11:16for the industry. We need transparency. 11:19One of the things that really makes me 11:20grieve the Meta situation is that when 11:22they were called on the carpet by 11:23essentially the world at large after 11:25this leaked, Meta chose not to lean into 11:28transparency. Meta chose not to release 11:31their fixed guidelines. You have to 11:32trust us that they're fixed. Why? Why? 11:35Why can't you release them? Is it really 11:37that hard? And so I I think that one of 11:39the things if you are looking at what AI 11:42system to use, look at the degree to 11:44which model makers who are self-p 11:47policing right now are able to 11:49articulate their ethical standards, 11:51their their constitutional principles, 11:53however they define them. You want to be 11:55in a place where you understand your 11:57risk vector because this is not just a 12:00risk for meta on meta platforms. If 12:02Llama will do this, every system that 12:05uses Llama is potentially at risk from a 12:08liability perspective. And so it's 12:10important if you're purchasing or using 12:12AI systems to understand where the 12:14ethical edges are. And I don't think 12:16that gets emphasized enough in 12:18purchasing cycles in vendor 12:19conversations. How do you know that the 12:22model is going to be a responsible actor 12:24in difficult situations? What I've 12:26outlined here is I would not call this a 12:28silver bullet approach. I don't think 12:30constitutional AI is the way forward for 12:32all no matter what. We will never get a 12:34better system. I do think that anthropic 12:36has done a great job articulating a 12:38practical way to engineer ethics into 12:40models as they get smarter and I think 12:42we need more approaches like that. I 12:44also think we need to be able to scale 12:46up those approaches to the industry 12:48level and I've suggested a few ways how. 12:50We cannot continue trying to play 12:53whack-a-ole and betting on leaked 12:55guidelines as a way forward here. Over a 12:58billion people use AI. it is impacting 13:00communities and children. We need to 13:02treat ethics as a central engineering 13:05problem and fortunately we have ways to 13:07do it. It's not impossible. So this is 13:10my ask. If you are involved in any kind 13:13of product building that uses AI 13:17systems, make sure you understand where 13:20the ethical core of your AI is and that 13:22you understand how to engineer 13:25protections to keep your users safe. 13:27Cheers.