From newsroom experiment to research benchmark: the journey of iMEdD’s 2023 pre-election discourse analysis

In the spring of 2023, as Greece headed into a closely contested double election, iMEdD launched an innovative and experimental initiative to track how political leaders spoke, what they emphasized, and how their language shaped the public debate—combining journalism, political science and AI tools. 

The working team 

Idea & Project Coordination: Thanasis Troboukis, Kelly Kiki (iMEdD) 
Journalistic Research/Analysis: Nota Vafea, Katerina Voutsina, Stefania Ibrishimova, Athina Thanasi, Kelly Kiki, Chrysoula Marinou, Thanasis Troboukis, Giorgos Schinas (iMEdD) 
IT Support: Christos Nomikos, Nikos Sarantos (iMEdD) 
Scientific Advisor on Political Theory: Antonis Galanopoulos, PhD Candidate at the School of Political Sciences, Aristotle University of Thessaloniki 
SoftwareDevelopment/ Data Analysis: Pavlos Sermpezis, Stelios Karamanidis, Dimitrios-Panteleimon Giakatos, Ilias Dimitriadis (Datalab, School of Informatics, Aristotle University) 
Datalab Director (School of Informatics, Aristotle University): Professor Athena Vakali 
Translation: Anatoli Stavroulopoulou 

Nearly three years later, that effort has moved beyond the newsroom. What began as an editorial experiment has now been developed into a peer-reviewed study, marking a transition from data-driven reporting to structured academic research. 

The paper, “AgoraSpeech: a multi-annotated comprehensive dataset of political discourse through the lens of humans and AI,” an initiative by Datalab, and published in the Journal of Computational Social Science, builds directly on that original project, turning its methodology and findings into a research resource offering useful insights for similar initiatives in the future. 

The 2023 project: mapping political language 

Back in 2023, the starting point was straightforward: what defines political discourse in Greece during an election period? 

To answer that, the team of journalists collected — in real time— speeches from six major political leaders, covering the entire pre-election period across both rounds of voting. The aim was not simply to document what was said, but to identify patterns—recurring themes, shifts in tone, and the rhetorical strategies used throughout the campaign. 

The analysis focused on four main dimensions: the issues dominating the agenda, the emotional tone of speeches, the degree of polarization, and the presence of populist narratives. 

Artificial intelligence played a significant role, but not an autonomous one. Language models were used to process and categorize large volumes of text, while journalists, data scientists and a political scientist reviewed and interpreted the results. Rather than replacing editorial judgment, the technology functioned as an experimental analytical tool within a supervised workflow. 

At the time, the project produced real-time insights into the campaign. At the same time, it raised broader questions about the role of AI in analyzing political speech—particularly where automated interpretation meets human expertise. 

You can view the detailed methodology used here 

From project to publication: the AgoraSpeech dataset 

The newly published paper builds on that process and extends it. 

At its core is AgoraSpeech, a dataset comprising 171 speeches delivered during the 2023 elections, spanning both the May and June campaigns. Each speech has been segmented into paragraphs and annotated across multiple dimensions, including topic, sentiment, named entities, polarization and populism. 

In total, the dataset includes more than 31,000 annotations. These combine automated outputs with detailed human validation, preserving the hybrid approach developed during the original project. 

This combination—AI for scale and consistency, human input for context and accuracy—is a central feature of the study. It demonstrates how an experimental newsroom workflow can be translated into a structured, transparent and reusable dataset. 

If you work on political discourse, one of the biggest challenges is data. High-quality, annotated datasets are hard to build, especially when they try to capture nuance rather than just keywords. 

AgoraSpeech is a step in that direction. It offers: 

  • a reference point for researchers studying political communication 
  • a concrete example of how AI can fit into newsroom workflows 
  • and a more transparent way to look at how political language shapes public debate