Researching Candidates’ Use of Twitter During the European Parliamentary Elections
Edited By Alex Frame, Arnaud Mercier, Gilles Brachotte and Caja Thimm
Hailed by many as a game-changer in political communication, Twitter has made its way into election campaigns all around the world. The European Parliamentary elections, taking place simultaneously in 28 countries, give us a unique comparative vision of the way the tool is used by candidates in different national contexts. This volume is the fruit of a research project bringing together scholars from 6 countries, specialised in communication science, media studies, linguistics and computer science. It seeks to characterise the way Twitter was used during the 2014 European election campaign, providing insights into communication styles and strategies observed in different languages and outlining methodological solutions for collecting and analysing political tweets in an electoral context.
2. Families of practices. A bottom-up approach to differentiate how French candidates made use of Twitter during the 2014 European Campaign (Compagno, Dario)
The aim of this chapter is to differentiate how French candidates made use of Twitter during the 2014 European campaign. We apply unsupervised learning techniques to let clusters of candidates emerge from the data. In particular, the variables used for clustering are operators interpreted by Twitter (hashtags, retweets, hyperlinks, mentions).
2.1 Introduction and rationale
The aim of this study is to understand how French candidates made use of Twitter during the 2014 European elections. In particular, we are interested in identifying a few families of practices, or “Twitter styles”, based on the operators found in the tweets. To this end we adopted a bottom-up approach, based on unsupervised learning (clustering). As a result we found three main “poles” defining the field of Twitter practices: some candidates tweet as a means for getting in direct contact with citizens, others diffuse hyperlinks to websites, while others still make an extensive use of the participative operators allowed by Twitter, such as hashtags, retweets and mentions. The results of this study permit to understand which practices are more common and to characterize them in accord to other variables such as the preferred tweeting interface of a candidate, his or her political party and the content of the words used in his or her tweets.
2.2 Corpus and methodology
The corpus used in this study was extracted from the TEE2014 database14. The database collects all the tweets produced by Belgian, English, French, German,←33 | 34→ Italian and Spanish candidates to the European Elections 2014 during the campaign. Our corpus includes all messages tweeted by French candidates. In total, around 72.000 tweets by 467 users have been collected. We decided not to consider for our analysis 106 candidates who tweeted less than 15 tweets during the campaign, and 4 candidates who tweeted more than 1000 tweets, leaving a more homogeneous corpus of 357 candidates.
Data treatment, analysis and plots in this study have been realized with the open source statistical software R (R Core Team 2015), including the dplyr (Wickham / Francois 2015), ggplot2 (Wickham 2009) and cluster (Maechler et al. 2015) packages15.
Our study focuses on candidates instead of on individual tweets. Therefore we began by summarizing for each candidate the proportion of tweets including an operator interpreted by Twitter (Thimm 2012, Einspänner / Dang-Anh / Thimm 2013). To calculate these proportions we have preferentially considered the meta-data present in the TEE2014 database, which in turn are based on the Twitter REST API (Twitter Inc. 2015) and on dedicated extraction algorithms. However we have recalculated these meta-data whenever needed by the ends of our research. The operators used in this analysis are the following:
1. The retweet operator “RT”: identifies retweets;
2. The hashtag operator “#”: identifies tweets containing hashtags;
3. The hyperlink operator “http”: identifies tweets containing links to webpages;
4. The direct message or reply operator “reply”, referred to as DM/reply operator from now on: identifies tweets starting with an “@” character;
5. The source operator “via”: identifies tweets ending with “via @” followed by a twitter account;
6. The mention operator “@”: identifies tweets including mentions in their text that are no retweet, DM/reply or source operator.
Other meta-data considered in the study are the candidate’s party name and the tweeting interface most often used by him or her. The choice to focus just on operators to perform a clusterisation of the candidates in our corpus has been determined by past works on the TEE2014 database (Mercier / Villa 2015, Thimm / Einspänner / Gautier 2015) and by a previous exploratory research confirming the importance of operators as features to differentiate among the candidates’ tweeting practices (Compagno 2015).←34 | 35→
The aim of figure 1 is to show how many times each media logic operator has been used by French candidates. The ratio of tweets by a candidate that made use of a media logic operator is plotted on the x-axis of the facets: therefore a value of 0 means that a candidate never used a certain operator, a value of 1 that he or she used it in every tweet, and a value of 0.5 that we find that operator in every second tweet produced by the candidate. On the y-axis we find the number of candidates distributed according to the ratio of their tweets employing each of the six operators.
Figure 1. Use of media logics operator by candidates
We can observe that the ratios of retweet and mention operators are centred and approximately normally distributed. The hashtag operator is slightly skewed to the left, suggesting that hashtags are often used in tweets. The hyperlink operator is instead more heavily skewed to the right, suggesting that there is a large number of tweets in our corpus not pointing to external webpages. This applies even more to the DM/reply and to the source operators: the vast majority of tweets are neither replies nor direct messages, and just a very small number of tweets in our corpus points to a source at the end of their text. However, we decided to isolate source operators, instead of considering them just as a kind of mention, because they correlate with the presence of hyperlinks in tweets, and would have acted as confounders in clusterisation.
Figure 2 permits to compare these different distributions more easily by plotting the median fraction of tweets per candidate that employ the operators. It is←35 | 36→ evident that retweet, mention and hashtag operators are found approximately in every second tweet. Instead hyperlinks and especially DM/reply and source operators are much rarer and found respectively in one tweet of four, of fifteen and of one hundred.
Figure 2. Global use of media logics operators
2.3 Exploration and clustering
In order to explore the relationships among the different operators, we performed a principal components analysis (PCA) using chi2 distances16, of which the results are shown on the left half of figure 3. PCA creates a space of representation in which observations (in this case candidates) can be positioned. The position of each point is determined by its preference to one or another operator. PCA therefore helps to understand if operators characterize mutually exclusive uses of Twitter. The first two factors of PCA, used as coordinates for our plot, explain respectively 40 % and 20 % of the point variability.←36 | 37→
Figure 3. Principal components analysis
PCA shows that hyperlink and mention operators tend to be mutually exclusive: if a candidate makes a large use of the first, he or she will not often use the second, and vice versa. Source operators are positively correlated to hyperlinks: in fact, they are often automatically added to the end of the tweets produced by tweet buttons in news and other websites.
Hashtag and retweet operators tend to be used by the same family of candidates. We can see that this third family does not make use of direct messages and replies, nor publishes many links to webpages. Conversely, those who make heavy use of DM/reply operators or of hyperlink operators do not insert many hashtag or retweet operators in their tweets.
We begin to form an idea of the main trends in the use of Twitter by French candidates in the 2014 campaign: the majority of them, represented at the lower-left corner of figure 3, used hashtags and retweets but did neither tweet many hyperlinks nor engage in direct conversations. We notice two other tendencies: some candidates, at the lower-right side of the diagram, used Twitter mainly to diffuse hyperlinks, while some other candidates, at the top, got more involved in direct exchanges.
To get a better understanding of this insight we performed an agglomerative hierarchical clustering on our data. This technique produces a segmentation of the observations into a predetermined number of classes, according to the preferences that the candidates show about the use of operators. After exploring several different partitions, we ended up with a five-classes clustering. The clusters are plotted on the right half of figure 3, on PCA coordinates. We notice three “extreme” groups of users and two intermediary ones.
The clusters are variable in size, as shown in figure 4. The largest, in the middle, that we called “Participative”, includes 164 candidates out of 357. The two small clusters at the extremes of the diagram, named “Interactional” and “Informative”, include respectively 6 and 33 candidates. The remaining two intermediary, medium-sized clusters named “Participative-Interactional” and “Participative-Informative” share what remains in two groups of 78 and 76 candidates each.
2.4 Interpretation of the clusters
Let us now enter into the details of the interpretation of these clusters. Figure 5 plots for each cluster the median fraction of tweets employing the media logic operators. This profiling helps us to understand what the candidates in each group have in common, and therefore to interpret these results. We chose not to plot the source operator at this stage because it is not useful to differentiate the five clusters.←38 | 39→
Figure 4. Size of the clusters
Figure 5. Profiling of the Twitter Styles
We named the first cluster on the left “Interactional” because the candidates in it used Twitter mainly to engage into direct conversations with other users. More than the 70 % of tweets produced by the candidates in this group are direct messages or replies. On the contrary, hashtag, hyperlink and retweet operators were very rarely used. Mention operators are found in one tweet of four. This tweeting style is highly unconventional, exploiting to the largest extent one minor function of Twitter (that of sending messages addressed to individual users). And in fact only a very small number of candidates, six, used Twitter in this manner. With some irony, we could say that tweets are seen by the users of this group as a substitute for SMS.
Moving to the other extreme of the diagram, to the farthest right, we find the “Informative” group. These candidates used Twitter almost exclusively as a means to diffuse information produced elsewhere, in the form of hyperlinks. They made a very modest if any use of all other operators. With another metaphor, we could say that Twitter is seen by this group’s members as a kind of RSS publisher and reader: they post links to websites but do not make use of other operators specific to Twitter. This cluster is the second smallest, including 33 candidates, and represents a second “deviant” use of Twitter – opposite to the first interactional one. Both these tweeting styles constitute radical exceptions to the most common use of Twitter, and they play an important role in shaping the spectrum of alternative uses allowed by this social medium.
At the centre of the diagram we find the largest cluster. The candidates in it made heavy use of hashtag, retweet and mention operators. We named this cluster “Participative” because these operators are meant to promote participation in larger exchanges and discourses: hashtags are anchors to certain events or topics, retweets are instruments of Twitter’s polyphony, repeating what was said by others, and mentions invite specific users (and their followers) as interlocutors. Participative candidates rarely engage in direct conversations and do not publish many links to external resources. They play with the operators most specific to Twitter: retweets, hashtags and mentions were born on this social medium and appear to define its most common use. We could say that these candidates practice “pure tweeting”, with no immediate interactional nor informative ends.
The “Participative_Interactional” cluster is an intermediary between pure interaction and pure participation. A fifth of the tweets produced by the users in this group were direct messages or replies, which is thrice as much as the global median. In addition to this, hashtag, retweet and mention operators are found each in 40 to 50 percent of their tweets. These candidates participate to the “meaning game” proper to Twitter, but also interact individually with other users.←40 | 41→
The last remaining group, “Participative_Informative”, is also an intermediary one. One of two tweets produced by the members of this cluster contains a hyperlink. Hashtag and retweet operators are found with the same frequency. Mentions are used slightly less, around a third of the times. Candidates in this cluster use hyperlinks twice as often as the global median, and also participate with hashtags, retweets and mentions.
2.5 Structuring the Tweeting practices of French candidates
These considerations could be thought as a way to give a structural description of the norms regulating the use of Twitter by French candidates during the 2014 campaign. The statistical tools make some of the semiotic norms visible, allowing the researcher to access new aspects of this practice. Rastier (2011) refers to the discourse norms made visible by corpus analysis as “new observables”. In our case, the clusters built on the use of the operators help to define the norms active in this field of tweeting practices.
The five families of tweeting practices are organized around three poles: the participative, the informative and the interactional. The three “pure” clusters are each close to one pole and far from the other two, while the two intermediary clusters are each close to two poles and far from the remaining third. It should be noticed that no candidates adopted a style close to the informative and to the interactional poles while being far from the participative one. It means that, in our corpus, informative and interactional uses of Twitter are mutually exclusive (there is no “Interactional-Informative” style).
This configuration allows us to talk about two opposite tendencies: an interactional and an informative one. When referring to a tendency, we will talk about both pure and intermediary clusters (for example, both the informative and the participative-informative clusters constitute the informative tendency). Schema 1 resumes the configuration of poles and tendencies.
It should be noticed that even if Twitter could have been used by candidates at the same time as a means to send personalized messages (to interact) and messages with a broader reach (to inform), no candidate used it in this way. Also, the diffusion of hyperlinks is very rarely accompanied by personalized mentions, excluding from the observed field of practices the possibility of a “hybrid” between interaction and information. Twitter was not an instrument to send personalized links to websites.←41 | 42→
Schema 1. Poles and tendencies
Candidates preferred either a more personalized approach or a more impersonal one, therefore showing that they conceptualized the medium in two very different ways, exploiting different aspects of its potential. It would be interesting, in future research, to further explore the structure of different fields of practices, and compare them. This could result in a better understanding of how technical potentials are reduced to a smaller set whenever their use is filtered by social and semiotic norms. The patterns of this reduction would point to some of the logics of culture and communication, regulating human exchanges.
Twitter per se, as a technology, allows a number of different uses: with Eco (1979) we could say that the Twitter model user is broadly defined, permitting many potential uses to co-exist. However what matters the most in this case is the transition between the model user and the empirical uses: the individual uses of the social medium. What characterize this transition are the norms, made visible in our diagrams, that cause just a limited number of families of uses to actually exist. If it is true that every candidate behaves differently, and if it is also true that←42 | 43→ Twitter gives the same potential to all candidates, we still manage to identify a few main patterns. Therefore it is exactly in the constitution of these patterns that we may see the effect of the social and semiotic norms responsible for reducing the large virtual potential of Twitter to a small number of actual families of uses.
French political communication showed difficulties merging two tendencies: the informative and the interactional. Maybe French candidates consciously perceive these two tendencies as mutually exclusive, and believe that a hybridization would be seen as contradictory by citizens. Another hypothesis could be that merging information and interaction, for example by sending to individual users many personalized references to websites, would simply be a costly activity.
2.6 Associations between clusters and tweeting interfaces
The five clusters have been calculated exclusively on the recurrence of operators in the candidates’ tweets. Therefore, we may now look for further associations between those families of practices and the other variables at our disposal. This may help us to better understand the tweeting styles and also to characterize the other variables’ levels on the base of what we have discovered.
We are going to start by observing how the clusters are associated with the preferred tweeting device of each candidate. This is important because different interfaces may produce different experiences of the medium, exposing users to subsets of the Twitter flux or making the use of some operators easier. Figure 6 plots the number of candidates using each tweeting interface most often. The colours of the bars indicate to which tweeting style the candidates belong.
We notice immediately that one interface (Twitter for iPhone) is more frequently used than all other ones (by 154 candidates in 357). The second most popular one (the Twitter website accessed with a web browser) is used by 71 candidates. Twitter for Android is used preferentially by 50 candidates, making it the third most popular. We find three other interfaces preferred each by 8 to 17 candidates (Facebook, Twitter for iPad, tweet buttons embedded in websites) and a higher number of interfaces used each by less than 8 candidates (summing up to 41 candidates and represented as the column “Other” in the diagram).←43 | 44→
Figure 6. Interfaces preferentially used for twitting
The three major interfaces are used by candidates in all clusters. However, we see that the proportions vary. The candidates using Twitter with an app for smartphones or tablets (either on iOs or on Android) tend to be more participative, and use Twitter to inform as much as to interact with other users. Candidates using iPads in particular use Twitter only rarely to distribute hyperlinks. Instead, the candidates tweeting from a web browser tend to be more informative than participative: as many of them are found in the informative cluster as the sum of those found in the participative and interactional ones.
This tendency becomes predominant for users tweeting from a Facebook application or from tweet buttons found directly on websites: almost the totality of these candidates use Twitter prevalently to distribute hyperlinks. In particular, a small subset of 7 candidates in the informative cluster makes a noticeable use of the source operator (in one tweet of three). This subset mostly uses tweet buttons to tweet, rather than Facebook or other interfaces. Finally, candidates using prevalently other interfaces seem to make the most balanced use of Twitter, as the distribution of clusters in this group closely resembles the global one.
Do the interfaces facilitate a certain approach to the medium, or is there another explanation for the association between the candidates’ tweeting styles and their preferences for a certain interface? Candidates using tweet buttons and Facebook application to tweet may actually not fully master the operators specific to Twitter (hashtags, mentions) and therefore adopt an informative←44 | 45→ tweeting style just because they tweet as a “residual”, secondary activity, in which they are not that confident or competent. On the other hand, iPhones and other mobile devices do allow a continuous scan of the Twitter flux, and so they may invite the user to retweet more often, or help to internalize the use of hashtags. Our study points to some potential paths that should be verified, either with experimental procedures, or with the analysis of dedicated datasets, accompanied by a semiotic comparison of the tweeting interfaces (Zinna 2004).
2.7 Associations between clusters and political parties
The families of Twitter practices that we have identified are not homogeneously distributed among the political parties of the candidates. Figure 7 shows the number of candidates for each of the major parties (with at least 15 tweeting candidates in our corpus), and the colour of the bars identifies the family to which the candidates belong.
Figure 7. Distribution of clusters among parties
Two of the largest parties, PS-PRG (“Parti socialiste – Parti radical de gauche”) and UDI-MODEM (“Union des démocrates et indépendants – Mouvement démocrate”), show a similar distribution. Around 75 % of their candidates belong to the participative cluster, and the remaining ones are equally divided between the informative and the interactional tendencies. The candidates of the UMP (“Union←45 | 46→ pour un mouvement populaire”) party are more attracted by the interactional pole: around 62 % of them adopted a participative style, while 31 % were more interactional and 7 % more informative. The fourth largest party, EELV (“Europe écologie les verts”), was instead more attracted by the informative pole (51 % participative, 31 % informative tendency, 17 % interactional tendency).
Among the four smaller parties, the Parti Pirate shows the most pronounce bias towards the interactional pole, with around the 75 % of its candidates tending towards it; just one in fifteen candidates has been classified as purely participative and the other three show an informative tendency. The FDG (“Front de gauche”) shows the opposite configuration, with 50 % of participative candidates and 41 % of informative ones. The candidates of the FN (“Front national”) were more informative (48 %) and almost equally participative and interactional (28 % and 24 % respectively). Last, Nous Citoyens was mainly participative (43 %), and more interactional (35 %) than informative (22 %). Figure 8 resumes this information with a PCA plot.
Figure 8. Principal components analysis (parties and tendencies)
We can observe that the Parti Pirate is the most interactional, while the FN and the FDG are the most informative, and the PS-PRG and UDI-MODEM the most participative. The UMP is equally interactional and participative. It is not possible to associate a certain tweeting tendency to the leftist or rightist orientation←46 | 47→ of a candidate. We remark that the most extremist parties (FN, FDG) adopt more informative styles, while the moderate ones (PS-PRG, UDI-MODEM, UMP) tend to be more participative. However it should be noticed that the moderate parties are also the largest ones.
2.8 Classification of tweets based on textual analysis
In the last step, we wanted to know if our clusters could be further characterized on the basis of the discourses realized by the candidates in their tweets. Thanks to the software Iramuteq (Ratinaud 2009), we performed to this end a classification of the tweets’ texts, based on the co-occurrence of words (Reinert 1983). As a result of this classification, the tweets in our corpus have been preferentially related to one of two ample classes of co-occurring words. Similitude diagrams for the two classes, realized with Iramuteq, are shown in figure 9.
The left half of the figure describes the first of the two classes, that we have called “Political Issues”. The words in this class are mainly related to political topics, such as the relationship between France and the European Union, the conduct of the French prime minister François Hollande, the relationships between the three main French parties PS, UMP and FN. Economic issues were also mentioned during the campaign and are included in this class, such as the TAFTA, unemployment and growth. The right half of figure 9 instead describes the lexical class we called “Electoral discourse”. It includes hashtags and words used during the campaign to communicate about events, meetings, debates and the elections dates. The two lexical classes are intentionally broad, but give some hints about the contents most addressed by the different parties on Twitter.
Based on the classification of the tweets realized with Iramuteq, we calculated the preference of each candidate’s tweets for political issues or for the campaign itself. Figure 10 plots the preferences of the candidates in each cluster.←47 | 48→
Figure 9. Lexical classes of the Tweets’ texts
Figure 10. Preference to lexical classes among clusters
We notice that in the participative cluster political issues and the electoral discourse appear to have the same importance. All the other clusters instead show a preference for the political issues, and this preference is more pronounced on the interactional side than on the informative side. Some of the tweets have not been categorized (they are coloured in green in figure 10), because in most cases they only included a hyperlink without any commentary. The majority of the candidates most often using uncategorised tweets belongs to the purely informative cluster. Figure 11 plots the preferences of each of the major political parties for political issues or for the electoral discourse.
The Parti Pirate is the one most focusing on political issues, followed closely by the EELV. On the opposite, 65 % of the UMP candidates mostly focused on the electoral discourse, followed by the UDI-MODEM (55 % of the candidates). This information should be complemented by the associations between clusters and political parties seen above: the Parti Pirate and the UMP were the most interactional parties; however, it seems that the kind of interaction they perform differs between the two. Our research appears to show that the members of the Parti Pirate use Twitter prevalently to engage in one-to-one conversations about political issues (a personalized political debate), while those of the UMP use Twitter to realize a personalized electoral discourse, inviting individual users to meetings and other occasions. In these two cases, we can talk of sub-syles of tweeting, de←49 | 50→fined within the interactional tendency. Textual analysis could be exploited even further to identify more tweeting sub-styles.
Figure 11. Preference to lexical classes among parties
We have found tendencies in our corpus that show how semiotic and social norms reduce and channel the technical potentials of a social medium. Twitter gives space to many different individual uses, but just a few families of uses, or “tweeting styles”, are actually realized. These styles are not homogeneously distributed among the political parties of the candidates. We have also found further associations between the tweeting styles and some other variables, such as the kind of discourse realized in the tweets’ texts or the favoured tweeting interface.
The next step of our research will be to perform a similar analysis to other corpora extracted from the TEE2014 database. We want to know if Belgian, English, German, Italian and Spanish candidates use Twitter in a comparable way to the French. Some of the tweeting styles we found in France may not exist elsewhere, or their importance may vary from one country to another. After having extended our research to a larger corpus we may be able to produce a better structural description of the norms guiding the use of Twitter for electoral communication in Europe today.←50 | 51→
Compagno, Dario: “Families of practices. Employing data mining techniques to differentiate how French candidates make use of Twitter”. In: Twitter at the European Elections 2014, Metz, February 2015.
Eco, Umberto: The role of the reader. Indiana UP: Bloomington 1979.
Einspänner, Jessica / Dang-Anh, Mark / Thimm, Caja: “Computer-assisted content analysis of Twitter data”. In: Bruns, Axel / Weller, Katrin / Burgess, Jean / Mahrt, Merja / Puschmann, Cornelius (eds.): Twitter and Society. Peter Lang: New York 2013, pp. 97–108.
Maechler, Martin / Rousseeuw, Peter / Struyf, Anja / Hubert, Mia / Hornik, Kurt: cluster: Cluster Analysis Basics and Extensions. R package, version 2.0.1. 2015, retrieved 1.6.2015, from https://cran.r-project.org/web/packages/cluster/.
Mercier, Arnaud / Villa, Marina: “The figure of the enemy during the 2014 European tweet-campaign in France and Italy”. In: Twitter at the European Elections 2014, Metz, February 2015.
Rastier, François: La mesure et le grain. Sémantique de corpus. Champion: Paris 2011.
Ratinaud, Pierre: IRaMuTeQ: Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. 2009, retrieved 1.6.2015, from http://www.iramuteq.org/.
Reinert, Max:. « Une méthode de classification descendante hiérarchique : application à l’analyse lexicale par contexte ». Les cahiers de l’analyse des données, VIII (2), 1983, pp. 187–198.
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna 2015, retrieved 1.6.2015, from http://www.R-project.org/.
Thimm, Caja: “Political deliberation and the internet: Forms and Functions of civic Participation on Twitter”. In: Communiquer dans un monde de normes. L’information et la communication dans les enjeux contemporains de la mondialisation, Lille, March 2012.
Thimm, Caja / Einspänner, Jessica / Gautier, Laurent (2015). “German political communication on Twitter during the 2014 European Elections”. In: Twitter at the European Elections 2014, Metz, February 2015.
Twitter Inc.: Twitter documentation. 2015, retrieved 1.6.2015, from https://dev.twitter.com/overview/documentation.
Wickham, Hadley. / Francois, Romain: dplyr: A Grammar of Data Manipulation. R package, version 0.4.1 2015, retrieved 1.6.2015, from https://cran.r-project.org/web/packages/dplyr/.
14 TEE2014 Project: http://msh-dijon.u-bourgogne.fr/recherche-msh/les-programmes/547-twitter-aux-elections-europeennes-une-etude-contrastive-internationale-des-utilisations-de-twitter-par-les-candidats-aux-elections-au-parlement-europeen-en-mai-2014-tee2014.html
15 The script and the aggregated data are available for reproducibility by contacting the author.
16 The function used to convert euclidean distances into chi2 ones has been developed at the Laboratoire d’informatique théorique et appliquée (LITA) de l’Université de Lorraine, that we would like to thank.