In 2016, the Brennan Center for Justice identified 151 local and state law enforcement agencies in the United States that have subscribed to social media monitoring services such as Geofeedia, Media Sonar, Snaptrends, Dataminr, DigitalStakeout, and Babel Street. Following investigations from reporters and the ACLU, many of these services are now defunct or subject to API restrictions, a devastating blow for these surveillance projects. However, little is known about how these surveillance services identify social media posts of interest and who is impacted. Our research analyzes one such platform: DigitalStakeout.
It is informative what had been uncovered previously about social media monitoring. MediaSonar, used by the Fresno Police Department, encouraged police to track #BlackLivesMatter and related hashtags to identify “threats to public safety.” After it was revealed that MediaSonar marketed itself as a way for police to “avoid the warrant process,” Twitter cut off the company’s access to their enterprise API. Twitter also cut SnapTrends’ API access after the release of details of law enforcement use of their software; SnapTrends closed shop shortly thereafter. Geofeedia was notably used during the Freddie Gray uprisings to “arrest [protesters] directly from the crowd” aided by social media posts and face recognition technology; shortly after this revelation from the ACLU of Northern California, Facebook, Twitter and Instagram all revoked API access from Geofeedia.
Notably, DigitalStakeout is still in the business of monitoring social media for police. Here in Oregon, during a trial period of DigitalStakeout, an agent of the Oregon Department of Justice used DigitalStakeout to search for #BlackLivesMatter, discovered that an Oregon DOJ attorney was tweeting support and wrote a memo describing the posts as “possible threats towards law enforcement”—the agent who wrote the memo was later found to be in violation of state law. While the Oregon DOJ now has a policy of not subscribing to such software, the local police department in Corvallis, Oregon, home to my employer, Oregon State University, subscribed to DigitalStakeout starting in July 2016.
In partnership with the Civil Liberties Defense Center, I requested data from the Corvallis Police Department of their use of DigitalStakeout. The data (eventually) returned logs of automated searches, configured by DigitalStakeout—in total, 7240 links to public social media posts on Twitter, Instagram, Flickr, Facebook and Youtube. Sociology professor Brett Burkhardt, computer science doctoral student Alexandria LeClerc and I analyzed the Twitter data and reported our findings in this paper as part of the Conference on Fairness, Accountability, and Transparency. Let me summarize some findings from our paper, and some observations from the data that didn’t make it into our research paper:
- DigitalStakeout only collects geotagged Tweets. To target the jurisdiction of the Corvallis Police Department, DigitalStakeout uses a geographical query to the Twitter API which only returns Tweets that are geotagged with the Tweeters current location. Lesson? Don’t geotag your Tweets. Twitter has Tweet geotagging turned off by default and has removed precise locations altogether. Of course, Twitter knows exactly where you are Tweeting from and uses this information to serve you location-based ads. It’s only a matter of company policy that stops Twitter from sharing that information with the likes of DigitalStakeout. I wouldn’t advise relying on company policy.
- DigitalStakeout poorly configures their searches. For one of the predetermined searches, DigitalStakeout used profile location information to capture Corvallis Tweeters. Lesson? Don’t put your location in your Twitter profile. But they badly configured that, apparently using “Benton” as a search term (Corvallis is in Benton County), returning Tweets from Benton County, Washington and Bentonville, Arkansas. Also, the searches seem to stop after collecting 100 Tweets per week. Useful!
- DigitalStakeout identifies mostly useless Tweets. We reverse engineered the search terms used for DigitalStakeout’s “Narcotics” search, explaining why the Tweets seemed mostly garbage. Sure, snow, hop, high, line, party, smoke, bowl, rock may refer to drugs, but almost universally pick up Tweets about weather, beer, and kids parties. Also notable are a proliferation of marijuana terms (e.g. indica, weed, pot, bud), while pot has been legal in Oregon for the entire subscription period!
- Tweets identified by DigitalStakeout seem to arise more from Black and Hispanic people compared to the local population. We can’t make direct comparisons, because the former is determined by a human from a Twitter profile and the latter by self-identification on the census, and there are a whole lot of differences between them, but only 1-2% of Corvallisites identify as Black, whereas over 6% of Corvallis Twitter geotaggers appear to be Black. That seems stark.
- Tweets identified by DigitalStakeout seem to arise more from White people compared to the Tweeters in the areas. A more direct comparison can be made between Corvallis Twitter geotaggers and those caught up by DigitalStakeout’s searches. The samples are too small to determine if the effect is significant, but there appears to be an increase in the proportion of White Tweeters.
Corvallis Tweeters Digital Stakeout White 71.8% 78.9% Black 6.5% 7.2% Hispanic 11.7% 7.8% Other 10.0% 6.1% - DigitalStakeout failed to identify a shooting threat to the Corvallis Police Department. in February 2018 (at a time when we know the Corvallis Police Department was still subscribing to DigitalStakeout), an individual was arrested for Tweets threatening a shooting on the Oregon State University’s Corvallis campus. However, the Tweets were not discovered through surveillance of social media but through an anonymous tip line.
- DigitalStakeout seems to no longer have access to Facebook and Instagram. Our data covers July 2016 to August 2017, with a three month gap starting April 2016. April 2016 is about when the Brennan Center say that Facebook and Twitter changed their policy to not allow social media surveillance software to access their APIs. Indeed, according to Twitter’s Master License Agreement, the Twitter API “may not be used by […] any public sector entity (or any entities providing services to such entities) for surveillance purposes, including but not limited to: (a) investigating or tracking Twitters users or their Content; and, (b) tracking, alerting, or other monitoring of sensitive events (including but not limited to protests, rallies, or community organizing meetings).” But our data show that DigitalStakeout continued to access the Twitter API after April 2016. I asked Twitter about this, and they said “We require a special review and continuous compliance audit […]. We work with DigitalStrikeout [sic] and we continue to work with them on carefully reviewed and approved Use Cases.” Lesson: Don’t trust company policy.
Racial disparities exist throughout the justice system, including in policing, contributing to a severe over-representation of people of color in US prisons. Given this, we argue that it is important to be able to audit tools used in the justice system for racial disparities. Social media monitoring is simply another avenue for creating disparities, and there are many points at which an inequity could be introduced: including access to social media, adoption of a particular social media platform, interacting with the platform in a way that gives access to monitoring software, and using certain keywords. One’s behavior, even on Twitter, could increase or decrease attention from law enforcement.
Whether the purpose of social media monitoring by police is for sentiment analysis or risk assessment, unless the population that is monitored mirrors that of the police jurisdiction, the bias will result in a skewed view of the population (if used for sentiment analysis) or undue attention on one sub-population over another (in the case of risk assessment). The log files we were able to obtain from the Corvallis Police Department allowed us to better understand this aspect of policing. While I would argue that any programmatic monitoring of social media impinges our civil liberties, at the very least, requiring that log files be available for independent evaluation would ensure transparency of the algorithms that are reshaping law enforcement.