Criminal Sentence Lengths and Racial Bias

Criminal Sentence Lengths and Racial BiasBIG DATA GROUPBlockedUnblockFollowFollowingMay 1By Isabelle Williams, Jenna Oratz, and Jaya KumariQuick link — find our code here.

We’ve all heard stories of the racial bias that occurs behind crime sentencing: two very similar crimes are committed by two criminals of different races, and result in different sentence lengths.

Many news sources have reported on this topic and found that sentence lengths are indeed affected by race.

https://www.

vox.

com/identities/2017/11/17/16668770/us-sentencing-commission-race-bookerhttps://www.

washingtonpost.

com/news/wonk/wp/2017/11/16/black-men-sentenced-to-more-time-for-committing-the-exact-same-crime-as-a-white-person-study-finds/?noredirect=on&utm_term=.

e0e985514555https://apps.

urban.

org/features/long-prison-terms/demographics.

htmlWe decided to look into whether these assertions from reputable news sources are true for a given state in the United States.

Our main goal was to figure out whether criminals of different races incur different sentence lengths for the same crimes.

The DataAfter sorting through several databases, we chose the state of Nebraska’s database for criminal sentencing information.

Here’s a link to it and what it looks like once downloaded:Compared to databases of other states, Nebraska’s stood out because it was the only database we found that provided the three criteria we were looking for: race of the criminal, the length of their sentence, and the type of crime they committed.

Visualizing the DataSome notes on our design choices:We only looked at two races, Black and White, as the other racial pools had too few data to garner much information.

In the machine learning portion, we included an “Other” category.

We determined sentence length as “Earliest Possible Release Date”-”Sentence Begin Date” in years.

For sentences less than a year, we changed the sentence length to a fraction of a year based on the months in the sentence.

Sentences of LFE (life) and DTH (death) were changed to 122 years, the oldest known lifespan of a human, so they could appear on histograms.

We did not want to exclude these sentences but we understand that giving them an arbitrary year may introduce bias.

We started by looking at the overall trend of sentence lengths of all crimes, separated into White and Black races, using a line graph of the density histogram, log scaling the x-axis:FIG 1: Black individuals receive more sentences less than a year in length, while the two races follow about the same trend for sentences lengths greater than a year.

White individuals have a slightly higher frequency for lengths just greater than 1 year, but blacks have a slightly higher frequency for 2–3 year sentences.

Above this many years, the trends of each race are about the same.

The trends follow a power law distribution for sentence lengths above 1 year.

Breaking down sentence lengths by crime:We then broke this down into the ten crimes with the highest number of occurrences in the data set: burglary, robbery, theft, manufacturing/distributing/dealing controlled substances, sexual assault 1st degree, forgery 2nd degree, possession of a controlled substance not including marijuana, theft by receiving stolen property, possession of methamphetamine, and driving under revoked license.

For each crime, we plotted a line graph of the sentence length values from a density histogram, x-axis again log scaled for visual clarity.

In general, we found that the sentences for blacks and whites were roughly the same for most crimes, with a couple exceptions:FIG 2: More white criminals than black criminals received 1 year sentences, but more black criminals received sentences < 1 year and 1< x < 4 years.

After 4 years, the curves were about the same.

FIG 3FIG 4FIG 5: It is clear that black criminals receive higher sentences.

Sentences <= 1 year had more white recipients, but 1 < x < 3 had more black recipients.

After 3 years, the curves were roughly the same.

FIG 6FIG 7FIG 8FIG 9FIG 10FIG 11Racial representation in each crime:First, we created a pie chart for the total racial representation of race in the data.

White individuals comprised just over half the data, Black individuals one fourth, Hispanic individuals about one tenth, and five percent other races.

FIG 12We then created these same pie charts for each of the ten most common crimes.

FIG 13: Slightly higher percentage of White individuals as compared to the total population, accordingly a slightly smaller amount of Blacks and Hispanics.

FIG 14: Double the percentage of robbery convicts were Black compared to the amount of Black representation in total crime — 51% compared to 25%.

White representation shrunk substantially.

FIG 15: Similar to burglary, a slightly higher percentage of Whites as compared to the total population.

Blacks rose slightly and Hispanics shrunk.

FIG 16: The Hispanic and Black populations both rose comparatively, 5% and 4.

7% respectively, and White populations decreased 6.

1%.

FIG 17: White and Hispanic populations both increased comparatively, 6.

7% and 3.

3% respectively.

Black population decreased 7.

1%.

FIG 18: White populations increased slightly (2.

2%).

Black populations increased more considerably (6.

5%).

Hispanic populations shrunk (6.

7%).

FIG 19: Black populations increased considerably here, by 13.

3%.

White and Hispanic populations decreased, by 5.

1% and 4.

3% respectively.

FIG 20: Numbers here are roughly similar to the total representations in crime, with Whites increasing 2.

4%, Blacks 0.

4%, and Hispanics shrinking 1.

9%.

FIG 21: This crime is overwhelming White, with an increase of 18.

2% to 76.

8% representation, over 3 in 4.

Blacks decreased by 21.

7% to a mere 3.

4%.

Hispanics increased by 4.

2%.

FIG 22: This is also relatively similar to the total representations in crime, with Whites increasing 2.

6%, Blacks decreasing 1.

5% and Hispanics decreasing 3.

3%.

Discussion of Findings From VisualizationsRace and drug possession charges (MANU/DIST/DEL/DISP OR POSS W/I):We found both that Black individuals are over represented in drug possession arrests (FIG 16) compared to their total representation in arrests (FIG 12), and that Black individuals are receiving longer sentences than their White counterparts (FIG 5).

Race and Burglary vs.

Robbery vs.

Theft:The are stark differences in racial representation between Burglary, Robbery and Theft (FIGs 13, 14, 15).

From the Nebraskan Legislature’s website, the differences in the three types are as follows: Burglary is willingly, maliciously and forcefully breaking and entering into someone else’s property with intent to commit theft or any other felony; no actual stealing must occur as long as the breaking and entering has occurred.

Robbery is forcibly and violently stealing something of value from another.

Theft is receiving or taken stolen property with the knowledge that it is stolen.

Burglary and robbery both use force, but robbery must be a violent force on another person.

Even though the difference in sentence lengths within each individual crime was not significant by race, we wanted to see if there were significantly different sentence lengths for each crime.

FIG 23: Sentence lengths for robbery are higher than that of burglary and theft.

We also found the five number statistical summary as well as the mean sentence lengths for the three crime types:Robbery Stats: Min: 0.

0 Q1: 2.

0 Median: 3.

0 Mean: 4.

0563427800269904 Q3: 5.

0 Max: 58.

0Burglary Stats: Min: 0.

0 Q1: 1.

0 Median: 2.

0 Mean: 2.

4904106220801574 Q3: 3.

0 Max: 134.

0Theft Stats: Min: 0.

0 Q1: 1.

0 Median: 1.

0 Mean: 2.

0246331236897275 Q3: 2.

0 Max: 102.

0We find that the sentence lengths for robbery are higher, and the racial representation of Black individuals is doubled for robbery as compared to burglary and theft.

Using the Data for Machine LearningIn this portion of our project, we attempted to utilize race and other information about the criminals to create a machine learning model that could accurately guess the sentence length of a criminal, as well as use sentence length and other information to predict race.

Feature EngineeringWe carefully selected the features we used in our model, keeping them limited to see how well the model would be able to predict based mostly upon the information of interest (race and sentence length).

We used sklearn’s LabelEncoder to change the Gender and Race columns to numerical values, and used pandas dummy variables to convert Crime and Race to one-hot encoded columns.

As part of this, we removed all criminals whose crime was not one of the ten most common, in order to reduce the number of columns from thousands to tens.

This still left us with 18,009 criminals, which we felt was a large enough dataset to get meaningful results.

We also subtracted the sentence length from the maximum year sentence to create a new value, the sentence length with respect to the maximum sentence, and did the same with minimum year sentence.

We did this instead of using these values alone because we felt the discrepancy of the sentence length from the minimum and maximum terms was more meaningful.

We split the data into a training set and a testing set, with the training set encompassing 4/5 of the data and the testing set the other 1/5.

Predicting RaceWe broke down the results based on crime in order to see if the model did a better job of predicting race for certain crimes over others.

We also tested the model on both the training and testing sets for each crime, as well as for the entire dataset.

We compared these results to that of a naive model that always guessed the criminal was White.

The results were as follows:All Crimes: Training Set Results: Percent Correct: 0.

6413697362332254 Percent Correct with naive model: 0.

5908375751966682 Top features: ['SENTENCE WRT MAX TERM: 0.

4143', 'SENTENCE WRT MIN TERM: 0.

3025', 'ROBBERY: 0.

0977'] Testing Set Results: Percent Correct: 0.

5941143808995003 Percent Correct with naive model: 0.

5893947806774015 Top features: ['SENTENCE WRT MAX TERM: 0.

4121', 'SENTENCE WRT MIN TERM: 0.

3022', 'ROBBERY: 0.

0998']Burglary: Training Set Results: Percent Correct: 0.

6594696969696969 Percent Correct with naive model: 0.

6431818181818182 Top features: ['SENTENCE WRT MAX TERM: 0.

5511', 'SENTENCE WRT MIN TERM: 0.

4198', 'GENDER: 0.

029'] Testing Set Results: Percent Correct: 0.

6407879490150638 Percent Correct with naive model: 0.

6500579374275782 Top features: ['SENTENCE WRT MAX TERM: 0.

5513', 'SENTENCE WRT MIN TERM: 0.

421', 'GENDER: 0.

0277']Driving Under Revoked License: Training Set Results: Percent Correct: 0.

6483870967741936 Percent Correct with naive model: 0.

632258064516129 Top features: ['SENTENCE WRT MAX TERM: 0.

5044', 'SENTENCE WRT MIN TERM: 0.

4287', 'GENDER: 0.

0669'] Testing Set Results: Percent Correct: 0.

543778801843318 Percent Correct with naive model: 0.

5391705069124424 Top features: ['SENTENCE WRT MAX TERM: 0.

5155', 'SENTENCE WRT MIN TERM: 0.

4193', 'GENDER: 0.

0652']Forgery 2nd Degree: Training Set Results: Percent Correct: 0.

6142684401451027 Percent Correct with naive model: 0.

5888754534461911 Top features: ['SENTENCE WRT MAX TERM: 0.

534', 'SENTENCE WRT MIN TERM: 0.

4184', 'GENDER: 0.

0476'] Testing Set Results: Percent Correct: 0.

5984848484848485 Percent Correct with naive model: 0.

6628787878787878 Top features: ['SENTENCE WRT MAX TERM: 0.

533', 'SENTENCE WRT MIN TERM: 0.

4151', 'GENDER: 0.

0519']Possession With Intent to Distribute and Similar: Training Set Results: Percent Correct: 0.

5785770132916341 Percent Correct with naive model: 0.

5293197810789679 Top features: ['SENTENCE WRT MAX TERM: 0.

5683', 'SENTENCE WRT MIN TERM: 0.

3683', 'GENDER: 0.

0635'] Testing Set Results: Percent Correct: 0.

5 Percent Correct with naive model: 0.

49336283185840707 Top features: ['SENTENCE WRT MAX TERM: 0.

5709', 'SENTENCE WRT MIN TERM: 0.

3665', 'GENDER: 0.

0626']Possession of Controlled Substance Except Marijuana: Training Set Results: Percent Correct: 0.

572928821470245 Percent Correct with naive model: 0.

5472578763127188 Top features: ['SENTENCE WRT MAX TERM: 0.

497', 'SENTENCE WRT MIN TERM: 0.

4297', 'GENDER: 0.

0734'] Testing Set Results: Percent Correct: 0.

5153846153846153 Percent Correct with naive model: 0.

5153846153846153 Top features: ['SENTENCE WRT MAX TERM: 0.

4974', 'SENTENCE WRT MIN TERM: 0.

4292', 'GENDER: 0.

0735']Possession of Methamphetamine: Training Set Results: Percent Correct: 0.

771484375 Percent Correct with naive model: 0.

76171875 Top features: ['SENTENCE WRT MIN TERM: 0.

4674', 'SENTENCE WRT MAX TERM: 0.

4456', 'GENDER: 0.

087'] Testing Set Results: Percent Correct: 0.

7875 Percent Correct with naive model: 0.

7875 Top features: ['SENTENCE WRT MIN TERM: 0.

4697', 'SENTENCE WRT MAX TERM: 0.

4442', 'GENDER: 0.

0861']Robbery: Training Set Results: Percent Correct: 0.

6095947063688999 Percent Correct with naive model: 0.

37220843672456577 Top features: ['SENTENCE WRT MAX TERM: 0.

5696', 'SENTENCE WRT MIN TERM: 0.

3957', 'GENDER: 0.

0347'] Testing Set Results: Percent Correct: 0.

49760765550239233 Percent Correct with naive model: 0.

3923444976076555 Top features: ['SENTENCE WRT MAX TERM: 0.

5609', 'SENTENCE WRT MIN TERM: 0.

4025', 'GENDER: 0.

0365']Sexual Assault 1st Degree: Training Set Results: Percent Correct: 0.

680568720379147 Percent Correct with naive model: 0.

628436018957346 Top features: ['SENTENCE WRT MAX TERM: 0.

5185', 'SENTENCE WRT MIN TERM: 0.

4658', 'GENDER: 0.

0157'] Testing Set Results: Percent Correct: 0.

6371681415929203 Percent Correct with naive model: 0.

672566371681416 Top features: ['SENTENCE WRT MAX TERM: 0.

5137', 'SENTENCE WRT MIN TERM: 0.

4705', 'GENDER: 0.

0158']Theft: Training Set Results: Percent Correct: 0.

6833333333333333 Percent Correct with naive model: 0.

6605263157894737 Top features: ['SENTENCE WRT MIN TERM: 0.

4658', 'SENTENCE WRT MAX TERM: 0.

4552', 'GENDER: 0.

079'] Testing Set Results: Percent Correct: 0.

6578249336870027 Percent Correct with naive model: 0.

6445623342175066 Top features: ['SENTENCE WRT MIN TERM: 0.

461', 'SENTENCE WRT MAX TERM: 0.

4597', 'GENDER: 0.

0792']Theft By Receiving Stolen Property: Training Set Results: Percent Correct: 0.

6291291291291291 Percent Correct with naive model: 0.

6081081081081081 Top features: ['SENTENCE WRT MAX TERM: 0.

5341', 'SENTENCE WRT MIN TERM: 0.

4178', 'GENDER: 0.

048'] Testing Set Results: Percent Correct: 0.

5833333333333334 Percent Correct with naive model: 0.

6031746031746031 Top features: ['SENTENCE WRT MAX TERM: 0.

5355', 'SENTENCE WRT MIN TERM: 0.

4163', 'GENDER: 0.

0482']ConclusionsOverall, our machine learning model was more effective than the naive model at predicting on the training set, but similarly effective at predicting on the testing set.

This means that it did successfully recognize some patterns in the training set, but these didn’t carry over to the testing set.

Sentence length with regard to max term and sentence length with regard to min term were the top two features for every category, meaning the model did find them to be important in guessing sentence length, but this is not necessarily significant since the models are not generally very accurate.

Possession of Methamphetamine had an interesting result for the testing set — the percent correct for our model was the same as for the naive model, meaning our model likely guessed that every criminal was White.

Since 76.

8% of criminals in this crime were White, this makes a lot of sense, but is still very interesting.

Robbery also had an interesting result — for the training set our model’s correctness was 23% higher that the naive model’s, and for the testing set it was still 10% higher.

This was by far the highest of any crime, and can likely be attributed to the aforementioned over-representation of Black individuals in robbery.

Predicting Sentence LengthHere, we did the same thing as with predicting race, except that we tried three different naive models — always predicting the mean, always predicting the mean, or always predicting 1 (the mode).

We also included race as an additional feature (using one-hot encoded columns for each race).

The results are as follows:All Crimes: Training Set Results: Average Years Off: 1.

1607589079130032 Average Years Off with naive model using mean: 1.

827314029509284 Average Years Off with naive model using median: 1.

696159185562239 Average Years Off with naive model using mode: 1.

7584451642757983 Top features: ['SENTENCE WRT MAX TERM: 0.

4487', 'SENTENCE WRT MIN TERM: 0.

3858', 'GENDER: 0.

0205'] Testing Set Results: Average Years Off: 1.

1740699611327041 Average Years Off with naive model using mean: 1.

6875600605622771 Average Years Off with naive model using median: 1.

567740144364242 Average Years Off with naive model using mode: 1.

6299278178789562 Top features: ['SENTENCE WRT MAX TERM: 0.

4485', 'SENTENCE WRT MIN TERM: 0.

3864', 'GENDER: 0.

0206']Burglary: Training Set Results: Average Years Off: 1.

3795454545454546 Average Years Off with naive model using mean: 1.

5582185491276401 Average Years Off with naive model using median: 1.

626515151515151 Average Years Off with naive model using mode: 1.

6265151515151515 Top features: ['SENTENCE WRT MIN TERM: 0.

4989', 'SENTENCE WRT MAX TERM: 0.

4669', 'GENDER: 0.

0136'] Testing Set Results: Average Years Off: 1.

52954808806489 Average Years Off with naive model using mean: 1.

4513681423367522 Average Years Off with naive model using median: 1.

544611819235226 Average Years Off with naive model using mode: 1.

544611819235226 Top features: ['SENTENCE WRT MIN TERM: 0.

5012', 'SENTENCE WRT MAX TERM: 0.

4654', 'GENDER: 0.

0135']Driving Under Revoked License: Training Set Results: Average Years Off: 0.

8806451612903226 Average Years Off with naive model using mean: 0.

911992715920916 Average Years Off with naive model using median: 0.

856451612903225 Average Years Off with naive model using mode: 0.

8564516129032258 Top features: ['SENTENCE WRT MAX TERM: 0.

4963', 'SENTENCE WRT MIN TERM: 0.

4541', 'GENDER: 0.

0151'] Testing Set Results: Average Years Off: 0.

8018433179723502 Average Years Off with naive model using mean: 0.

8169636220773425 Average Years Off with naive model using median: 0.

774193548387096 Average Years Off with naive model using mode: 0.

7741935483870968 Top features: ['SENTENCE WRT MAX TERM: 0.

4931', 'SENTENCE WRT MIN TERM: 0.

4566', 'GENDER: 0.

0153']Forgery 2nd Degree: Training Set Results: Average Years Off: 1.

0181378476420797 Average Years Off with naive model using mean: 1.

1849621817469314 Average Years Off with naive model using median: 1.

037484885126965 Average Years Off with naive model using mode: 1.

037484885126965 Top features: ['SENTENCE WRT MIN TERM: 0.

5038', 'SENTENCE WRT MAX TERM: 0.

4168', 'GENDER: 0.

0369'] Testing Set Results: Average Years Off: 1.

1477272727272727 Average Years Off with naive model using mean: 1.

0070018365472888 Average Years Off with naive model using median: 0.

871212121212121 Average Years Off with naive model using mode: 0.

8712121212121212 Top features: ['SENTENCE WRT MIN TERM: 0.

505', 'SENTENCE WRT MAX TERM: 0.

4163', 'GENDER: 0.

0373']Possession With Intent to Distribute and Similar: Training Set Results: Average Years Off: 1.

0742767787333856 Average Years Off with naive model using mean: 1.

1186991889798592 Average Years Off with naive model using median: 1.

175136825645035 Average Years Off with naive model using mode: 1.

1751368256450352 Top features: ['SENTENCE WRT MAX TERM: 0.

4807', 'SENTENCE WRT MIN TERM: 0.

4588', 'GENDER: 0.

0203'] Testing Set Results: Average Years Off: 0.

922566371681416 Average Years Off with naive model using mean: 1.

087771164539119 Average Years Off with naive model using median: 1.

148230088495575 Average Years Off with naive model using mode: 1.

1482300884955752 Top features: ['SENTENCE WRT MAX TERM: 0.

4816', 'SENTENCE WRT MIN TERM: 0.

4589', 'GENDER: 0.

0202']Possession of Controlled Substance Except Marijuana: Training Set Results: Average Years Off: 0.

8751458576429405 Average Years Off with naive model using mean: 0.

8075278201754006 Average Years Off with naive model using median: 0.

707117852975496 Average Years Off with naive model using mode: 0.

7071178529754959 Top features: ['SENTENCE WRT MIN TERM: 0.

5213', 'SENTENCE WRT MAX TERM: 0.

4221', 'GENDER: 0.

0258'] Testing Set Results: Average Years Off: 0.

8346153846153846 Average Years Off with naive model using mean: 0.

6122781065088765 Average Years Off with naive model using median: 0.

561538461538462 Average Years Off with naive model using mode: 0.

5615384615384615 Top features: ['SENTENCE WRT MIN TERM: 0.

5165', 'SENTENCE WRT MAX TERM: 0.

4267', 'GENDER: 0.

0259']Possession of Methamphetamine: Training Set Results: Average Years Off: 0.

7734375 Average Years Off with naive model using mean: 0.

88427734375 Average Years Off with naive model using median: 0.

66015625 Average Years Off with naive model using mode: 0.

66015625 Top features: ['SENTENCE WRT MIN TERM: 0.

6117', 'SENTENCE WRT MAX TERM: 0.

3545', 'GENDER: 0.

0124'] Testing Set Results: Average Years Off: 0.

89375 Average Years Off with naive model using mean: 0.

883203125 Average Years Off with naive model using median: 0.

6125 Average Years Off with naive model using mode: 0.

6125 Top features: ['SENTENCE WRT MIN TERM: 0.

6123', 'SENTENCE WRT MAX TERM: 0.

3541', 'GENDER: 0.

012']Robbery: Training Set Results: Average Years Off: 1.

2928039702233252 Average Years Off with naive model using mean: 2.

9327555054762047 Average Years Off with naive model using median: 2.

645161290322581 Average Years Off with naive model using mode: 3.

1943755169561623 Top features: ['SENTENCE WRT MAX TERM: 0.

5686', 'SENTENCE WRT MIN TERM: 0.

3688', 'GENDER: 0.

0221'] Testing Set Results: Average Years Off: 1.

3397129186602872 Average Years Off with naive model using mean: 2.

807856962981618 Average Years Off with naive model using median: 2.

535885167464115 Average Years Off with naive model using mode: 3.

15311004784689 Top features: ['SENTENCE WRT MAX TERM: 0.

5667', 'SENTENCE WRT MIN TERM: 0.

3699', 'GENDER: 0.

0229']Sexual Assault 1st Degree: Training Set Results: Average Years Off: 1.

2966824644549764 Average Years Off with naive model using mean: 4.

060600615439881 Average Years Off with naive model using median: 3.

630331753554502 Average Years Off with naive model using mode: 4.

734597156398104 Top features: ['SENTENCE WRT MAX TERM: 0.

6166', 'SENTENCE WRT MIN TERM: 0.

347', 'WHITE: 0.

0099'] Testing Set Results: Average Years Off: 1.

1563421828908556 Average Years Off with naive model using mean: 3.

1724227947894583 Average Years Off with naive model using median: 2.

899705014749263 Average Years Off with naive model using mode: 4.

129793510324483 Top features: ['SENTENCE WRT MAX TERM: 0.

6179', 'SENTENCE WRT MIN TERM: 0.

3471', 'OTHER: 0.

0096']Theft: Training Set Results: Average Years Off: 1.

2526315789473683 Average Years Off with naive model using mean: 1.

146075715604802 Average Years Off with naive model using median: 1.

100877192982456 Average Years Off with naive model using mode: 1.

1008771929824561 Top features: ['SENTENCE WRT MIN TERM: 0.

4883', 'SENTENCE WRT MAX TERM: 0.

4624', 'GENDER: 0.

0193'] Testing Set Results: Average Years Off: 1.

16710875331565 Average Years Off with naive model using mean: 1.

10216774901674 Average Years Off with naive model using median: 1.

037135278514589 Average Years Off with naive model using mode: 1.

0371352785145889 Top features: ['SENTENCE WRT MIN TERM: 0.

4904', 'SENTENCE WRT MAX TERM: 0.

4619', 'GENDER: 0.

0186']Theft By Receiving Stolen Property: Training Set Results: Average Years Off: 0.

9864864864864865 Average Years Off with naive model using mean: 1.

1954026098170283 Average Years Off with naive model using median: 0.

987987987987988 Average Years Off with naive model using mode: 0.

987987987987988 Top features: ['SENTENCE WRT MIN TERM: 0.

481', 'SENTENCE WRT MAX TERM: 0.

4547', 'GENDER: 0.

0201'] Testing Set Results: Average Years Off: 1.

0436507936507937 Average Years Off with naive model using mean: 1.

1678004535147377 Average Years Off with naive model using median: 0.

873015873015873 Average Years Off with naive model using mode: 0.

873015873015873 Top features: ['SENTENCE WRT MIN TERM: 0.

4867', 'SENTENCE WRT MAX TERM: 0.

4495', 'GENDER: 0.

0199']ConclusionsOn the data as a whole, our model did quite a bit better than any of the naive models we tried, with the average years from the actual sentence being ~0.

5 years better for our model.

For some crimes, our model was much better, while for others, one or two of the naive models were better (although which ones changed between crimes).

Overall, our model was more accurate than any one of the naive models.

None of the racial categories (Black, White, or Other) make the top three features for any category except Sexual Assault, which suggests that race is not an important factor in determining sentence length.

However, Sexual Assault has the best accuracy of our model compared to the naive models (~2 years closer to the actual sentence on average), which suggests race may actually play some role.

Additionally, the racial categories may fall out of the top three because they are three separate categories, which lowers the feature importance of any one of them.

ConclusionsWe were expecting to find stronger evidence of racial bias immediately in the data, as we had found many examples of in the media.

Some biases are present, but it took more analysis to find them.

For example, it was not possible to determine difference in sentence lengths in all the crime data, but racial differences were evident for drug possession crimes and in racial representation in certain crimes with longer sentence lengths, like robbery.

The machine learning portion of our project produced some very interesting results in regards to the correlation of race and sentence length.

Based on our data and visualizations, it was unsurprising that our model’s accuracy was not much better overall than the naive models, but we did find that our model was more consistently accurate for every category, while the naive model was more prone to errors based on differences in the different crime categories.

If we were to continue this project, we’d hope to look at a more diverse dataset than that of Nebraska, in hopes of drawing more concrete conclusions about any possible connections between race and sentence length.

Also, to make definitive conclusions, we would need to look at more information in the data that could account for longer sentence lengths that the Nebraska set is missing.

.

. More details

Leave a Reply