🚨INVESTIGATION🚨[Part 1] Catching Partisan Bot Activity on Twitter Prior to the Ousting of the Pakistani Prime Minister
Imran Khan, the ex-Prime Minister of Pakistan was dramatically removed from office on 10th April 2022 after a vote of no confidence was held against him in the National Assembly. Throughout the expulsion process, Khan maintained that there was a foreign conspiracy to remove him from office. Meanwhile, the opposition coalition, the Pakistan Democratic Movement, asserted that Khan had violated the constitution by dissolving the National Assembly and should face legal penalties.
Amid the political crisis, both supporters and opponents of Imran Khan resorted to narrative propagation in an attempt to influence public opinion ahead of the vote of no confidence. It seemed that both camps utilized bot accounts to push their narrative further in the Twitter network. Given the growing significance of social media as a source of news and opinions, it is crucial to determine whether there was a coordinated effort to manipulate public opinion in the run-up to the vote of no confidence. In our investigation, we aim to explore the coordinated use of bots by both pro and anti-Imran Khan camps on Twitter to shape Khan's image.
In this article, we will explore how the presence of partisan bots changed in the days leading to the vote of no confidence. In the next article [Part 2], we will explore what were the narratives that were being propagated by the two camps.
Events of expulsion:
Efforts for a vote of no confidence had been going on for many months, however, during the last 10 days before the vote we saw significant developments in the expulsion process. During this time we saw the deputy speaker of the assembly dismissing the vote of no confidence, followed by the President dissolving the National Assembly. This caused immense frustration for the opposition coalition who urged the judicial system to intervene. The Supreme Court declared the dismissal of the National Assembly unconstitutional and produced a new date for the vote to take place. Between this verdict and the actual vote, we saw a substantial increase in Twitter activity related to topics about Imran Khan.
The timeline of the events can be seen below.
For our investigation, we collected data from 1 April 2022 to 9 April 2022 to obtain a comprehensive understanding of the popular narratives immediately prior to Khans’ ousting on 10th April 2022. To conduct our investigation, we searched for tweets that met the following criteria:
1) They contained the keywords "Imran" AND "Khan" anywhere within the tweet.
2) They contained the keywords "عمران" AND "خان" anywhere within the tweet (which are the Urdu translations of Imran and Khan).
3) They contained the keyword "ImranKhanpti" (the official Twitter handle of Imran Khan).
Stance detection:
To determine the stance of each tweet and agent, we utilized the ORA software's Stance Detection algorithm to conduct the classification of tweets and agents into pro and anti groups. This machine learning approach is grounded in social theory and provides a semi-supervised method that is based on co-training multiple stance classifiers (using different interaction modalities) which provides a better stance prediction model. Literature on this model can be found in the paper Social Media Analytics for Stance Mining A Multi-Modal Approach with Weak Supervision.
To execute the stance detection process, we initially supplied a list of labeled hashtags denoting support or opposition to Imran Khan. ORA employed these labeled hashtags to identify partisan agents who made use of the specified hashtags the most. The stance detection algorithm then leveraged the concept of influence propagation to assign a stance to actors who did not employ any of the pre-labeled hashtags.
The table below shows the list of hashtags that were manually labeled as pro and anti-Imran Khan:
After running the stance classification algorithm on a sample of 2,420,378 tweets (that mentioned Imran Khan) the distribution looked as following:
The pro-Imran Khan actors and tweets significantly outnumber the anti-Imran Khan agents and tweets in the network. We see that 62% of the actors (organic + bot) were pro-Khan compared to 12% anti-Khan actors (organic + bot). However, anti-Imran Khan agents tend to tweet more about Imran Khan (per account) compared to pro-Imran Khan bots. Additionally, we find that neutral agents tweet substantially less than the partisan agents in our network. This resulted in 77% of the tweets mentioning Khan being pro-Khan compared to 20% anti-Khan tweets.
Although we see that pro-Khan actors and tweets dominated our dataset, however, we have not yet understood if this narrative was an organic or a coordinated effort through the use of bots. To figure that out, we divide each of the partisan networks into organic and non-organic groups by calculating bot probabilities for the agents in the network.
Bot identification:
In order to identify bots in our network we use Beskow and Carley’s Tier 1 BotHunter algorithm. The tool calculates the bot probability of each actor based on metadata available about the account and its accompanying text. The literature on the model is available at Bot-hunter: A Tiered Approach to Detecting & Characterizing Automated Activity on Twitter.
BotHunter is a “random forest regression model trained on labeled Twitter data sets … developed from forensic analyses of events with extensively reported bot activity, such as the attack against the Atlantic Council Digital Forensic Research Lab in 2017”. – Janice Blane
The Tier-1 BotHunter calculates the bot probability based on the following measure:
i) Network-level features: number of followers, number of friends, and number of favorites
ii) User-level features: screen name length, account age, default image, screen name entropy, total tweets, and source(binned)
iii) Tweet-level features: timing, is it a retweet, and the hashtag used
Pro Imran Khan bot presence in the network:
The results from the BotHunter algorithm show that approximately 32% of actors with a positive stance toward Imran Khan have a bot probability higher than 0.7, indicating that they are probably bots. Additionally, around 47% of the pro-Imran Khan tweets that were published on the network were created by these bot accounts. Although this number seemed bloated when we first came across it.
For most individuals, it is surprising to find such significant bot participation. However, literature on other similar events showed that it is not reasonable to experience such high bot participation in topic-focused communities on Twitter. One example of this is a research conducted at Carnegie Mellon in 2020 which found that 82% of the top 50 most influential people in a network discussing Covid-19 were bots.
We then analyzed how the number of pro-Imran Khan agents and bots changed over time in our dataset to get a better understanding of how the bot deployment strategy varied for each camp.
Here we see that the number of pro-Khan bots increases between April 1st and 2nd, before falling slowly till April 6th. After April 6th we see the number of bots increase exponentially in the network. The Supreme Court of Pakistan reinstated the National Assembly on April 7th, leading to an increase in pro-Imran Khan activity on Twitter in hopes of influencing the narrative before the vote of no confidence on April 10th. From April 7th to 9th, both the number of bots and bot-generated pro-Imran Khan tweets increased significantly.
Right after the dissolving of the National Assembly on April 3rd, we saw a fall in both the number of Imran Khan tweets created by organic users as well as bots. This could indicate falling organic support for Khan between April 2-7. We believe that after the President of Pakistan dissolved the National Assembly, many people viewed the action as unconstitutional, which could have led to a decline in the total number of pro-Imran Khan agents from April 3rd to April 6th. However, we anticipated that there would have been a significant bot presence after the announcement in order to counter the expected backlash. However, the absence of this caused us to question why that might have been so. We think that there could be a few plausible explanations for this. Firstly, it can indicate that our algorithm may not have done a good job at identifying bots/non-bots. Secondly, it could indicate that bot controllers were not aware of this decision and thus did not follow PTI’s short-term strategy and objectives. Although these are interesting questions, no causality should be asserted in the absence of proof in favor of either argument. We recommend separate investigations for these questions.
Anti-Imran Khan bots in the network:
For anti-Imran Khan actors, the results suggest that 39% of the total actors with a negative stance towards Imran Khan are bots in the dataset.
Although the percentage of bots in the anti-Khan camp was slightly less than those in the pro-Khan camp, the anti-Khan bots published more tweets per account than pro-Khan bots.
We then explored how the presence of anti-Khan bots changed in our network.
The number of anti-Imran Khan bots fluctuated throughout the period covered in our dataset. We see that the total number of anti-Khan bots reduced gradually from April 1-3. However, it spikes on April 4, one day after Khan dissolved the assembly. These bots propagated a negative sentiment that was already being displayed by organic accounts.
After April 7th, anti-Khan bot activity increased significantly after the Supreme Court's decision to hold the vote of no confidence against Khan was announced. The number of anti-Khan tweets produced by bots also increased substantially. However, there was a significant increase in non-bot accounts tweeting after April 7th as well. The gap between bot and non-bot accounts was highest on April 7th and 9th, the day the Supreme Court declared the PTI's decision to dissolve the National Assembly unconstitutional and the day before the vote of no confidence.
All in all, we saw that both pro-Khan and anti-Khan camps used bots to propagate their messages. Overall we see significantly more tweets that were bot-generated in the pro-Khan network than in the anti-Khan tweets. However, it should be noted that there were also significantly more pro-Khan tweets by organic accounts than anti-Khan tweets by organic accounts as well. If we were to look at percentages, we see that 39% of the accounts in the anti-Khan camp were bots compared to 32% in the pro-Khan camp. One of the most interesting observations in our analysis so far was the strategy behind bot deployment by each camp. In the pro-Khan camp, we saw that the number of bots increased gradually in the network before rising exponentially since the new date for the vote was announced. In comparison, we saw that bots in the ani-Khan network fluctuated day to day but did also have a positive trend. This could potentially indicate that the pro-Khan bot strategy could have been more planned to increase gradually to display a consistent and gradual increase in Khan’s support, while anti-Khan bots were more reactive to political developments on the ground.
In the next article, we will explore the narrative and network maneuvers bots from each camp were taking part in so that we can identify the exact objectives of the malicious actors in each camp.