We are within a year of the next Presidential election, and just as important, many Senate, the entire house, state, and local elections. We have spent the last 3 years talking about foreign meddling in our elections and the pending battle against bot farms, coordinated adversarial information campaigns, stolen data, email hacks, and all of the goodies we experienced in 2016. At IST Research, we have decided that a very important first step is a robust characterization of the information environment surrounding the election and then flexible and timely detection of anomalies within that environment. We have challenged our team to put our platform, Pulse, into action to accomplish this task.  We will be continually monitoring the environment until we get past Decision Day 2020.  Stay tuned for more updates!

_____________________________________________________________

Written By: Sean Birdsell

Sean Birdsell is IST’s lead Customer Advocate. He spends his days making sure that the Customer’s needs are understood and represented inside the company.

 

At IST we are encouraged to ask questions about the information environment (IE) for our own edification. These efforts often lead to practices and insights for our customers. In that vein, two weeks ago I decided to take a look at the online state of things surrounding the 2020 Presidential election. As I am not a political analyst, I was also curious what I could discover given my lack of detailed knowledge on the subject. 

As this was a small personal project, I decided I would focus the bulk of my collection around Twitter and, additionally, the campaign pages of several candidates on Facebook. Of particular interest was the conversation around five likely battleground states: Ohio, Florida, Virginia, Wisconsin, and Michigan. With that loose plan in place, I set about determining how to get relevant information.

The first challenge to solve was one of scale: tracking all online conversation around the candidates for president—even limited to the five battleground states—would require compute and storage expenses that were beyond my budget. In order to bring the volume of information to a reasonable size, I used multi-word Twitter keyword rules (e.g., “Donald Trump Florida 2020” and “ewarren Wisconsin”). After developing and entering approximately 250 rules like this, I deployed the project and waited. 

After three days, a colleague and I reviewed what was coming into the system. The results were disappointing—the rules I had developed were too restrictive and weren’t generating much information at all. At my colleague’s suggestion, I opened up the aperture by simply adding the hashtags used by the campaigns and their supporters (e.g., “bernie2020”). It was at this point that we started receiving data at a volume that we thought could provide some indication of the state of the IE.

After seven days, I stopped collection and began to sift through what we’d taken in. While there were no great insights, I found several things to be of interest. First, the vast difference in the volume of conversation depending on the candidate (fig. 1)

Document volume by platform

Figure 1 – Document volume by platform

I had expected more engagement around the Democratic candidates, particularly Sanders and Warren and especially on Twitter.

We collected a little over 500K tweets, almost all of which were in English and Spanish–again, no surprise. What did catch my attention was a spike in Spanish-language tweets at a time when English-languge tweets were in a trough (fig. 2).

Document Language Historgram

Figure 2 – Document language histogram

 

The spike was a result of mass re-tweeting of a tweet from former Bolivian president Evo Morales Ayma (@evoespueblo)f. I do not speak Spanish, so I used machine translation to get the following:

“My greeting and thanks to Brother @BernieSanders, US presidential candidate, for highlighting our task of poverty reduction and denouncing the #GolpeDeEstadoEnBoliva. The international community demands Bolivia’s return to democracy”

Over the three days for which we have data, that tweet was re-tweeted over 12,000 times.

Finally, to Facebook (fig. 3):

Number of page posts and the aggregate reactions they received

Figure 3: Number of page posts and the aggregate reactions they received

What I found interesting about this particular visualization was the clear lead in numbers that President Trump is showing. I expected a bit more parity. Based on these numbers, I’m left wondering how involved Democratic supporters are (at least the ones on Facebook).

As a small project, I found the outcomes interesting if not earth-shattering. My plan is to revisit this assessment several times over the run up to the election. In support of that, I’ll be engaging with my more politically astute colleagues to get a more nuanced view.