Learning Library

← Back to Library

AI‑Generated Synthetic Data Predicts Election

Key Points

  • Researchers at Wuhan University generated demographically‑tuned synthetic data using ChatGPT‑4 and successfully forecast Trump’s Electoral College victory within 5–10 votes.
  • Their method involved prompting the model with detailed voter profiles (e.g., “35‑year‑old white woman in Vermont”) and weighting responses by each state’s voting history.
  • The study highlights synthetic data as a promising supplement to traditional polling, which is facing declining response rates and reliability issues.
  • Replicating the approach requires extensive cross‑tabulated demographic data from multiple election cycles and careful prompt engineering, and could be improved by adding economic variables like inflation.
  • Future research is expected to refine the technique with larger context windows and richer contextual inputs, potentially increasing predictive accuracy even further.

Full Transcript

# AI‑Generated Synthetic Data Predicts Election **Source:** [https://www.youtube.com/watch?v=CgmCsSbPAlk](https://www.youtube.com/watch?v=CgmCsSbPAlk) **Duration:** 00:03:24 ## Summary - Researchers at Wuhan University generated demographically‑tuned synthetic data using ChatGPT‑4 and successfully forecast Trump’s Electoral College victory within 5–10 votes. - Their method involved prompting the model with detailed voter profiles (e.g., “35‑year‑old white woman in Vermont”) and weighting responses by each state’s voting history. - The study highlights synthetic data as a promising supplement to traditional polling, which is facing declining response rates and reliability issues. - Replicating the approach requires extensive cross‑tabulated demographic data from multiple election cycles and careful prompt engineering, and could be improved by adding economic variables like inflation. - Future research is expected to refine the technique with larger context windows and richer contextual inputs, potentially increasing predictive accuracy even further. ## Sections - [00:00:00](https://www.youtube.com/watch?v=CgmCsSbPAlk&t=0s) **AI‑Generated Synthetic Data Predicts Election** - Researchers at Wuhan University created demographically‑tuned synthetic responses using ChatGPT, enabling them to forecast the 2024 U.S. Electoral College result within 5‑10 votes of the actual outcome. ## Full Transcript
0:00you know one of the biggest debates 0:02right now is the value of synthetic data 0:04or data that's generated by artificial 0:06intelligence I saw a really interesting 0:09application for that in election result 0:11prediction this is from a paper that was 0:13published out of Wuhan uh Wuhan 0:15University in November specifically 0:18November 3rd so before the election and 0:21what they did was the researchers 0:24generated a bunch of synthetic data that 0:26was demographically tuned to particular 0:28States de graphic makeup and that was 0:33weighted by the state's voting history 0:36and then they generated a bunch of 0:37synthetic data off of the tuned uh 0:41Baseline knowledge base that they fed to 0:43chat 0:44gp40 and so they would do things like 0:46say okay you are a 35-year-old white 0:49woman in 0:50Vermont who are you voting for in this 0:53presidential election and then they 0:54would see what synthetic data was 0:56generated by chat 0:58gp40 based on what the model could read 1:02of the state's voting preferences Broken 1:04Out by demographics and you might think 1:06well that's super reductive how 1:09informative is it they were able to 1:12correctly forecast Trump's victory in 1:16the Electoral College they got within 1:18depending on their exact approach within 1:20about uh 5 to 10 Electoral College votes 1:23so about one state off so it was 1:26surprisingly accurate actually I was I 1:28was a little bit shocked I wasn't 1:30expecting it to be that useful and I 1:33think that as we continue to run into a 1:36world where traditional polling has more 1:38and more questions around it because 1:39people don't answer their phones all the 1:41time internet polling has a lot of other 1:43issues we may see in future electoral 1:46Cycles more and more focus on how 1:50synthetic data that is tuned can help us 1:53get more reliable estimates of what's 1:55going on so I actually thought about 1:58like how would I replicate this you have 2:00to get a large data set you have to get 2:02the cross tabs broken out uh across 2:06multiple electoral Cycles across all 50 2:09states in the US so that you understand 2:11how different demographics have changed 2:13their votes over time on average and 2:15then you're going to have to in a very 2:17clear and structured way prepare chat 2:19GPT 40 with that data so you can ask it 2:22questions and start to generate 2:23synthetic data it's not a light 2:26undertaking but I would think after sort 2:30of seeing their General approach that we 2:32will see a lot of follow-up papers that 2:34seek to take this and fine-tune it for 2:36example this didn't include inflation or 2:39economic data as far as I can see and 2:42yet everyone who was leaving the polls 2:45or talking about the election like 2:46inflation was a major topic and so I 2:49would expect that we will start to see 2:51more sophisticated use of the context 2:54window to set up these chat Bots to 2:56generate this synthetic data and I would 2:58guess that that would make synthetic 3:00data even more predictive but to be 3:02honest getting within five or 10 3:04electoral votes on a very crude sort of 3:06demographic basis is pretty good so we 3:11will see what happens but I thought I 3:12would share that approach I think it's 3:13an interesting example of how synthetic 3:15data can be valuable and even predictive 3:17in certain situations I'll put the paper 3:20in the link underneath