Code
suppressPackageStartupMessages({
library(tidyverse)
library(janitor)
library(RColorBrewer)
})
I examine the number of people prosecuted and number of deaths from violent crimes in the United States. Not all violent crimes, just violent crimes that have an ideological motivation and guilty verdict. I use data1 from The Prosecution Project Dataset (Loadenthal et al. 2023). In this data, each row is a defendant in a case.
After Charlie Kirk was assassinated in September, the US president, Donald Trump, and Elon Musk said the left is the real problem (Suter 2025; Hutzler and Stoddart 2025). They struggle with numbers, so this report looks at reliable numbers. The left, right, and others commit violence. However, the lopsided swell of violence from the right is obvious for any observer interested in knowledge more than inflammatory politics.
Import the data, clean the column names, and remove pending cases and cases without clear guilty verdict, and filter for only the US.
suppressPackageStartupMessages({
library(tidyverse)
library(janitor)
library(RColorBrewer)
})
<- read_csv("data/tpp-2025-09-15-general.csv") |>
df clean_names()
# fix the mix of numeric and character in the # killed column
<- df |>
df mutate(
number_killed = case_when(
%in% c("Multiple", "Unknown") ~ NA_real_, # Convert text to NA
number_killed is.na(number_killed) ~ NA_real_, # Keep existing NAs
TRUE ~ number_killed # Keep numeric values as they are
)
)
<- c("Charged but not tried", "Data not available", "Not guilty", "Hung jury/mistrial", "Pending")
no_clear_guilt <- df |>
num_no_clear_guilt filter(verdict %in% no_clear_guilt) |>
nrow()
<- df |>
df filter(!verdict %in% no_clear_guilt)
<- df |>
df filter(location_country == "United States")
There are 1322 defendants without a clear guilty verdict that I remove from the analysis.
Are any defendants duplicated for the same charge? For example, being in the data on two different dates as part of the same case?
<- df |>
duplicates group_by(full_legal_name, name_of_case) |>
filter(n() > 1) |>
ungroup()
<- df |>
df filter(!case_id %in% c("09262001_PMC", "02142024_KC"))
There are two cases where the defendants are in the data twice for the same charge, once for their indictment and once for a second part of their trial, on different dates. I remove two rows to avoid double counting these two defendants. This leaves us with 3302 defendants. Four others are duplicates but for different charges, so I keep them.
How many defendants are not linked or motivated politically?
<- nrow(df)
total_defendants
<- df |>
defendants_not_political filter(criminal_method == "Criminal violation not linked or motivated politically") |>
nrow()
There are 243 defendants not politically motivated out of a total of 3302 defendants, which is 7.36%. So, I exclude these non-politically motivated defendants going forward.
# remove defendants not politically motivated or unknown
<- df |>
df filter(!criminal_method %in% c("Unknown/unspecified/undeveloped", "Criminal violation not linked or motivated politically"))
# parse dates
<- df |>
df mutate(
date = str_trim(date),
date = lubridate::mdy(date)
)
Were all of these crimes fully carried through?
|>
df count(completion_of_crime) |>
::kable(
knitrcol.names = c("Case Outcome", "Number of Prosecutions"),
caption = "Table 1: Case Outcomes"
)
Case Outcome | Number of Prosecutions |
---|---|
Attempted | 296 |
Carried through | 2050 |
Planned but not attempted | 493 |
Threat | 186 |
Unknown | 13 |
<- df |>
num_threats_and_unknown filter(completion_of_crime %in% c("Threat", "Unknown")) |>
nrow()
I am more interested in harm that was attempted, planned, or occurred, so I will filter out the others (199 defendants). That leaves us with 2839 defendants.
<- df |>
df filter(!completion_of_crime %in% c("Threat", "Unknown"))
Which dates are represented in the data?
# Warn if minimum date is earlier than 1900 or latest date is later than today
<- range(df$date, na.rm = TRUE)
date_range if (date_range[1] < as.Date("1900-01-01") || date_range[2] > Sys.Date()) {
warning("Date range is outside expected bounds.")
}
|>
df summarise(
`Earliest Date` = min(date, na.rm = TRUE),
`Latest Date` = max(date, na.rm = TRUE)
|>
) ::kable(caption = "Table 2: Date Range of the Data") knitr
Earliest Date | Latest Date |
---|---|
1990-01-22 | 2025-05-13 |
|>
df count(ideological_affiliation) |>
::kable(
knitrcol.names = c("Ideological Affiliation", "Number of Prosecutions"),
caption = "Table 3: Number of Prosecutions by Ideological Affiliation"
)
Ideological Affiliation | Number of Prosecutions |
---|---|
Leftist: eco-animal focused | 120 |
Leftist: government-focused | 98 |
Leftist: identity-focused | 24 |
Leftist: unspecified | 6 |
Nationalist-separatist | 70 |
No affiliation/not a factor | 103 |
Other | 39 |
Rightist: abortion-focused | 85 |
Rightist: government-focused | 402 |
Rightist: identity-focused | 1202 |
Rightist: unspecified | 43 |
Salafi/Jihadist/Islamist | 539 |
Unclear | 108 |
Because there are multiple leftist categories and multiple rightist categories, I add a column that consolidates leftist types and rightest types. This hides important variation within ideologies but allows to more easily asses political statements about left vs. right violence.
<- df |>
df mutate(
ideology_simple = case_when(
str_detect(ideological_affiliation, "Leftist") ~ "Leftist",
str_detect(ideological_affiliation, "Rightist") ~ "Rightist",
TRUE ~ ideological_affiliation
)
)
|>
df count(ideology_simple, sort = TRUE) |>
::kable(
knitrcol.names = c("Ideological Affiliation", "Number of Prosecutions"),
caption = "Table 4: Number of Prosecutions by Simplified Ideological Affiliation"
)
Ideological Affiliation | Number of Prosecutions |
---|---|
Rightist | 1732 |
Salafi/Jihadist/Islamist | 539 |
Leftist | 248 |
Unclear | 108 |
No affiliation/not a factor | 103 |
Nationalist-separatist | 70 |
Other | 39 |
Then I filter for only those to make our visualizations.
# classify methods as violent or non-violent
<- c(
violent "Unarmed assault",
"Hostage-taking",
"Armed intimidation/standoff",
"Vehicle ramming",
"Chemical or biological weapon deployment",
"Firearms: civilian",
"Firearms: military",
"Explosives",
"Other weapons"
)
<- df |>
df mutate(method_type = if_else(
%in% violent,
criminal_method "violent",
"non-violent"
))
|>
df group_by(ideology_simple, method_type) |>
summarise(n = n()) |>
pivot_wider(names_from = method_type, values_from = n, values_fill = 0) |>
select(ideology_simple, violent, `non-violent`) |>
arrange(desc(violent)) |>
::kable(
knitrcol.names = c("Ideological Affiliation", "Violent Prosecutions", "Non-Violent Prosecutions"),
caption = "Table 5: Number of Violent vs. Non-Violent Prosecutions by Ideological Affiliation"
)
Ideological Affiliation | Violent Prosecutions | Non-Violent Prosecutions |
---|---|---|
Rightist | 952 | 780 |
Salafi/Jihadist/Islamist | 133 | 406 |
Unclear | 77 | 31 |
Leftist | 62 | 186 |
No affiliation/not a factor | 38 | 65 |
Other | 35 | 4 |
Nationalist-separatist | 33 | 37 |
<- df |>
violent_crimes filter(
== "violent",
method_type != "No affiliation/not a factor"
ideology_simple
)
<- violent_crimes |>
violent_crimes_by_ideology count(ideology_simple, name = "violent_prosecutions") |>
arrange(desc(violent_prosecutions))
ggplot(violent_crimes_by_ideology, aes(x = reorder(ideology_simple, violent_prosecutions),
y = violent_prosecutions)) +
geom_col(fill = "black", alpha = 0.8) +
coord_flip() +
labs(
title = "Defendants Prosecuted for Violent Crimes in the US",
subtitle = "By Ideological Affiliation of the Perpetrator",
x = "Their Ideology",
y = "Number of Defendants",
caption = "Source: The Prosecution Project"
+
) theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title = element_text(size = 11)
)
<- violent_crimes |>
df_yearly mutate(
year = lubridate::year(date),
year = as.integer(year)
|>
) filter(!is.na(year)) |>
count(year, ideology_simple, name = "prosecutions")
ggplot(df_yearly, aes(x = year, y = prosecutions, fill = ideology_simple)) +
geom_col() +
scale_fill_manual(values = c(
"Rightist" = "#c14a58ff", # Red for Rightist
"Leftist" = "#337fb5ff", # Blue for Leftist
"Salafi/Jihadist/Islamist" = "#e08738ff", # Orange
"Unclear" = "#71a771ff", # Green
"Nationalist-separatist" = "#8c564b", # Brown
"Other" = "#e377c2" # Pink
+
)) scale_x_continuous(breaks = scales::pretty_breaks(n = 8)) +
labs(
title = "Defendants Prosecuted for Violent Crimes in the US by Year",
subtitle = "By Ideological Affiliation of the Perpetrator",
x = "Year",
y = "Number of Defendants",
fill = "Their Ideology",
caption = "Source: The Prosecution Project"
+
) guides(fill = guide_legend(ncol = 6)) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title = element_text(size = 11),
legend.position = "bottom",
legend.title = element_text(size = 10),
legend.text = element_text(size = 9)
)
# Don't double count deaths in cases where more than one defendant is tried for the same case
<- violent_crimes |>
df_deaths mutate(
# Extract the incident date from Case ID (first part before underscore)
incident_date = str_extract(case_id, "^[^_]+"),
# Parse the date correctly
incident_date_formatted = case_when(
str_length(incident_date) == 8 & str_detect(incident_date, "^\\d{8}$") ~
paste0(str_sub(incident_date, 1, 2), "/",
str_sub(incident_date, 3, 4), "/",
str_sub(incident_date, 5, 8)),
TRUE ~ incident_date
)|>
) group_by(incident_date, incident_date_formatted) |>
summarise(
unique_incidents = 1,
related_cases = n(),
total_individuals = n_distinct(full_legal_name),
total_deaths = first(number_killed, na_rm = TRUE),
case_names = paste(unique(name_of_case), collapse = "; "),
ideology_simple = first(ideology_simple, na_rm = TRUE),
.groups = "drop"
|>
) group_by(ideology_simple) |>
summarise(total_deaths = sum(total_deaths, na.rm = TRUE)) |>
arrange(desc(total_deaths))
<- ggplot(df_deaths, aes(x = reorder(ideology_simple, total_deaths),
plot1 y = total_deaths)) +
geom_col(fill = "black", alpha = 0.8) +
coord_flip() +
labs(
title = "Who Causes More Death by Violent Crime in the US?",
subtitle = "By Ideological Affiliation of the Perpetrator",
x = "Perpetrator Ideology",
y = "Deaths",
caption = "Source: The Prosecution Project"
+
) theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title = element_text(size = 11)
)
plot1
Hutzler, Alexandra, and Michelle Stoddart. 2025. “Trump Doubles down on Blaming ‘radical Left’ after Vow to Go after Political Violence.” ABC News, September 12. https://abcnews.go.com/Politics/trump-doubles-blaming-radical-left-after-vow-after/story?id=125509965.
Loadenthal, Michael, Lauren Donahoe, Madison Weaver, Sara Godfrey, Kathryn Blowers, et. al. 2023. “The Prosecution Project Dataset,” the Prosecution Project, 2023 [dataset]. https://theprosecutionproject.org/
Suter, Tara. 2025. “Elon Musk: ‘The Left Is the Party of Murder.’” Text. The Hill, September 14. https://thehill.com/policy/technology/5502535-elon-musk-charlie-kirk-death/.
This descriptive analysis was created by Jeremy Allen using R, tidyverse, janitor, and RColorBrewer packages. The report is built with Quarto. The code is available on GitHub.
See The Prosecution Project’s FAQ where they discuss what data is included, excluded, and why. For example, their data is likely an undercount of violence because they compile their data only from prosecutions. So, if there were a violent incident and the perpetrator were killed onsite, there would be no prosecution and no data point in this set.↩︎