An Ashoka University scholar’s working paper alleging vote manipulation in seats narrowly won by the BJP in 2019 has sparked a heated—and not-at-all-useful debate. Samarth Bansal looks at the paper’s statistical analysis to explain the research—and how it reached its conclusions.
Editor’s note: Please don’t forget to take our big reader survey—your feedback is critical to us. We only do this once a year and the results help shape splainer’s growth and direction. So we plan to nag you about this over the coming weeks.
Written by: Samarth Bansal. Samarth possesses a rare combination of an IIT degree and extensive journalistic experience. In his words, he is deeply invested in “rigorous, fact-based, data-informed and thoughtful reporting” that helps us “understand the impossibly complex world we live in.” His work has appeared in the Hindustan Times, The Hindu, Mint, HuffPost India, The Atlantic, The Wall Street Journal and Quartz. He is @PySamarth on Twitter, and you can learn more about him and his work here.
On July 31, a dramatic Twitter thread summarising work-in-progress research unleashed a political storm—a storm significant enough that politicians made a fuss, news channels hosted debates, and trolls did their thing.
But who can blame them? The tweet boldly asserted that a research paper revealed “scientific evidence” of potential vote manipulation by the BJP during the 2019 national election. And before you roll your eyes, this wasn’t yet another rehash of the ‘electronic voting machines were hacked’ melodrama—which has become a post-election ritual for many a losing side.
This time, it was about data and statistics. Not the fascinating fictions of op-ed columnists. This time, the claims deserved attention—and they certainly grabbed mine as someone who covered the 2019 election with a special focus on election data.
The background
The paper falls into a type of data analysis called election forensics—using statistics to check if election results are legit. It boils down to a simple idea: can we use the basic voting count data to figure out if an election is accurate? The cool thing about stats is they show patterns, and when patterns break, it’s a sign that something is off. This is the kind of stuff credit card companies look for. So why not apply it to elections?
This kind of research is largely absent in India’s academic landscape. So Ashoka University’s Sabyasachi Das has indeed broken new ground with his working paper. Yes, this paper has not been peer-reviewed or published in a journal yet. But it has been publicly shared on Social Science Research Network or SSRN—an open-access repository and preprint server for scholarly research in the social sciences and humanities—and a number of smart people have looked at it.
What makes Das’ paper especially interesting is that he looks at the 2019 Lok Sabha elections—which has been the subject of a number of controversies. There were news reports of thousands of voter IDs being deleted because of new rules linking Aadhar to electoral rolls. The Election Commission messed up while sharing post-election data—the first set of figures released had a mismatch between the count of the people who voted (voter turnout) and the count of total votes (final results). A fresh approach to this highly debated election is both smart and welcome.
What makes Das' paper incendiary is its core allegation—reflected in its title ‘Democratic Backsliding in the World’s Largest Democracy’. He claims to have found statistical evidence of targeted electoral discrimination against Muslims, partly facilitated by weak monitoring by official election observers. This, in turn, led to more votes for the BJP.
Yet, Das is careful to clarify that he does not claim that the manipulation changed the outcome of the election—writing instead: “The paper is unable to comment on the overall extent of manipulation in the 2019 general election.”
That caveat aside, how did Das arrive at his conclusions? Here’s a look at his core arguments—including my reservations, which I will explain. But to be clear, this is not an exhaustive analysis—nor do I possess the kind of expertise required for a proper review by academic peers such as statisticians and economists.
Important to note: I emailed Das with a request for an interview, but didn’t hear back.
One: An irregular pattern
When crafting a statistical study to address a specific query, it's not always feasible to analyse every single detail. Das, therefore, does not look at all 543 Lok Sabha seats, but zeroes in on seats where the outcome was a close call—59 constituencies where the difference in vote share between the top two parties was less than 5%.
In such close elections, the laws of probability say that both parties will have roughly equal chances of winning—50-50. Think of it this way: at such a neck-and-neck stage, the ultimate victor should be almost random. Therefore, among those 59 seats, one would anticipate BJP securing somewhere around 30 victories.
However, BJP managed to snag 41 seats—11 more than the expected tally.
Is that a big deal? If BJP had clinched victory in all 59 closely contested seats, it'd certainly be a glaring anomaly. But are 41 wins sufficient to raise alarm bells? Think of it this way: imagine you flip a coin a hundred times. You'd anticipate heads and tails each showing up 50 times. Yet, if heads pops up 99 times, something’s fishy about the coin. But what if it's heads 70 times? Is that still shady?
Enter statistical tests. They’re the Sherlock Holmes that helps us link cause to effect and figure out if an occurrence is a mere coincidence or the result of some underlying bias.
Statistical tests help us determine the tipping point—when we shift from "this could happen by chance" to "chance can't explain this.” Das deploys the McCrary test to conclude that, yes, the number of BJP wins deviates from the norm.
To further test his findings, Das applies the same analysis to previous national and state elections. He finds this pattern emerges only in 2019—and is absent even in state assembly elections held alongside or following the general election.
Das also discovers that BJP’s disproportionate win in closely contested constituencies occurs primarily in states governed by the party at the time of the election.
These notable anomalies lay the groundwork for the paper. Something’s off, Das concludes.
Two: Strategy or fraud?
Alright, the voting patterns seem weird. But what if the BJP was simply more astute in its political manoeuvring in those seats? What if its workers just went that extra mile?
In the realm of election forensics, it’s a well-known fact that just because a pattern resembles evidence of fraud, it may not actually prove anything. The same pattern could be produced by electoral strategy or routine tactics like voter mobilisation. Teasing out a solid cause-effect relationship is a key challenge in this field.
Here's an analogy to illustrate: imagine an ice-cream chain runs a contest where every student scoring above 90% in their board exams gets a free Special Chocolate Fudge Brownie. A sweet treat for academic excellence.
However, things take an unexpected turn—far more students than anticipated surpass the 90% mark. Could one confidently say there was widespread cheating just to earn extra ice-cream? What if students simply studied harder because of the incentive? Or maybe this time the exam was easier? Multiple explanations can account for unexpected patterns.
In his paper, Das acknowledges this issue. He states:
The incumbent party in India may have been able to exercise precise control in 2019 since it had significantly built up its organizational capacity in several states, subsequent to its 2014 general election victory. It mobilized active party workers at the level of polling stations who monitored and shaped voter attitudes, backed by centrally managed teams analyzing the collected information and suggesting campaign strategies.
He then goes on to test this thesis. If superior strategy were the main cause of the voting pattern, Das posits that the closely contested constituencies won by the BJP should show that the party campaigned harder in these seats than its rivals.
However, he encounters a roadblock—there’s no comprehensive data on campaigning by political parties across all constituencies in India. So he turns to the closest available data point—extracted from the post-poll National Election Survey of 2019. It’s the answer to this one question:
Did a candidate/party worker of the following parties come to your house to ask for your vote in the last one month?
After running a statistical test, he concludes that the responses don't reveal a significant difference between the campaigns run by the major parties. Das then dismisses BJP’s campaigning efforts as an explanation for the unexpected pattern.
In my view: This evidence may not be enough to entirely negate the theory that BJP simply ran a better campaign. Relying solely on one question from a national survey does not provide a robust answer in itself.
In my coverage of the 2019 election, I reported on the BJP’s political consulting firm—the Association of Billion Minds (ABM). Staff members told me about their operations: how they gathered ground-level intelligence, conducted surveys to gauge public sentiment towards the BJP and other political leaders, and assessed the viability of potential candidates. They used historical election data to identify polling booths that could swing the election, and suggested candidates the party should poach from rivals.
The BJP did invest a lot of attention and energy on closely contested seats. Did it swing the outcome in these seats? Or was it insignificant? We don’t know. And certainly, responses to a solitary question in a nationwide survey can’t offer a substantive answer. In my view, Das needed more data to dismiss this potential cause for BJP’s edge. Perhaps original data collection or fieldwork could have offered more reliable insights. But this dimension is not explored in the paper.
In any case, Das moves on. He focuses on testing for the mechanics of electoral fraud.
Third: How it happened
To test for manipulation of votes, Das focuses on Muslim voters “who generally do not support BJP and are easily identified in the voter list due to their culturally distinct name.”
Here is how he makes his case for potential manipulation.
Voter suppression: A dangerously effective way to skew an election is to suppress turnout—especially of those who you know don’t support you. The most effective solution? Just kick them off the voter rolls.
Did the BJP do that to Muslims?
In India, turnout has been growing from one election to another. Das looks at the number of registered voters for each parliamentary constituency between 2014 and 2019. He then shows that the growth is relatively lower in closely contested seats won by the BJP. Moreover, this lower growth rate is concentrated in seats with a higher share of Muslim voters.
From the paper:
This suggests that there might have been strategic deletion of names, especially targeting Muslim voters, to influence election outcomes in favor of the BJP, particularly in BJP-ruled states.
Local manipulation at polling stations: After the 2019 election results rolled in, the Election Commission of India initially released voter turnout figures for the first four phases. Oddly, these numbers didn't align with the total votes in those seats—which triggered a big controversy.
The ECI eventually corrected these figures following media reports highlighting the discrepancy. But here's the conundrum: was this a mere administrative hiccup or a result of poll officers playing with the numbers—a fraud inadvertently exposed by the release of disparate datasets?
Das takes on this puzzle. He acquires both versions of the turnout data and calculates the absolute difference between them.
What does he find?
Once more, Das reveals that closely contested seats won by the BJP are far more likely to reveal a significant difference between the two versions of ECI’s turnout figures. He concludes that this anomaly also hints at the potential for manipulation—but from a completely different dataset.
In my view: This was not a good data set to use for such research. I had analysed these differences right after the election and concluded that it was most likely a result of an administrative glitch—and at least one reporter who covers the Election Commission confirmed the same to me. Of course, there is no firm evidence of that glitch either. But if I were doing this study, I wouldn’t use this numerical gap to prove manipulation.
Weak monitoring at polling stations: Das further points out that this manipulation was partly facilitated by weak monitoring.
He collects data on counting observers—who are picked either from State Civil Service (SCS) or the Indian Administrative Service cadres. Das also assumes that state officers are more pliable than the centrally appointed officers. Based on this premise, he looks for a correlation: do seats with more SCS officers also reveal a greater discrepancy in the two sets of turnout data released by the ECI?
The answer: yes. Das, therefore, concludes:
[T]he fraction of counting observers who are SCS and come from BJP ruled states positively predicts the extent of turnout data discrepancy in the PC; in PCs that BJP lost, no such relationship holds.
The big picture
I do not know whether or not there was electoral manipulation in 2019. A weakness in a thesis hardly proves its opposite. But here are the conclusions I reached after studying Das’ paper at length:
One: Statistical analysis can give the appearance of objectivity because it relies on data and mathematical methods to draw conclusions. But it’s important to remember that statistics are just one among many tools used to understand complex issues like elections. Statistical analysis should always be complemented by other investigative methods—such as ground research—to determine the legitimacy of elections.
This paper relies entirely on data to make a case, which, from a distance, is great, because it reduces the scope of potential bias. But the reality is that there isn’t enough data in India for robust analysis–compared, say, to the US. Maybe that’s why Das had to rely on a single response in a national survey. And given this data scarcity, we should be careful when we do our research based on available numbers alone. Either the researcher has to collect more data or use qualitative research to justify arguments.
Two: The title of this paper does a disservice to Das’ extensive research. The statistical work in the paper is a good—even groundbreaking—attempt to figure out a new method to study the integrity of Indian elections. More researchers should use similar approaches to test other elections at the state level. But the gross exaggeration of the title—“democratic backsliding”—has instead obscured its important contribution.
As Das himself notes:
The tests are, however, not proofs of fraud, nor does it suggest that manipulation was widespread. Proving electoral manipulation in a robust democracy is a significantly harder task that would require detailed investigation of electoral data in each constituency separately…Nonetheless, electoral fraud even in a single constituency would imply that such manipulations by incumbent parties are possible.
Sure, but manipulation in a single constituency hardly merits that kind of title. This points to the well-documented problem in the incentive structure of academic publishing—splashier results with bold claims have a higher chance of getting published in top journals.
Three: In a world dominated by social media’s thirst for quick answers, we must remember that academic discourse thrives in the embrace of uncertainty. Das asked a good and an important question. This is the job of academics. But a single paper is hardly the final truth—as with most cases it’s a fragment of a broader puzzle.
The essence of scientific inquiry lies in the painstaking journey of countless minds, relentlessly seeking, refining, and challenging ideas. Each study lays the groundwork for the next, guiding us closer to the truth. But in our polarised environment, we instead jump on any research—or researcher— that does not align with one’s ideological bias.
It’s hard to do science on performative platforms like Twitter—a fact made painfully visible in the heated and often uninformed discourse surrounding Das’ paper. The ad-hominem attacks on Das are simply not justified—nor should they be tolerated as part of any debate on academic research.
Four: Das’ research raises critical questions that deserve further research. Why is the BJP more likely to win close elections? Or how about this: why is electoral participation falling among Muslim? As Rahul Verma and Armaan Mathur highlighted in The India Forum:
There are clear indications of Muslim politics undergoing deep churn. While we may lack quantifiable indicators to substantiate this claim at the moment, there is a noticeable shift in the nature of political engagement by the Muslim community in the past few years.
Overall turnout among Muslims has either stagnated or decline. Political parties like the SP or TMC that actively mobilized Muslims during elections are now less likely to do so and will nominate fewer candidates from the community as they fear a majoritarian backlash. Despite this, voting patterns since 2014 indicate greater consolidation of the community behind one candidate or party than ever before.
In some ways, these changes represent the central paradox of Muslim politics in contemporary India.
Surely, this is an urgent and worthy subject for all scholars—whether they use statistical analysis or any other methodology.
The bottomline: It would serve all of us better if researchers are more careful in communicating their findings—especially when results are pending peer review. And it would certainly help if readers are more literate in interpreting academic research.