Exploratory Data Analysis of Road Accidents in Saudi Arabia

انقر هنا لقراءة النسخة العربية

Road accidents are a major cause of fatalities and injuries in Saudi Arabia and reducing road fatalities is one of the main goals of Vision 2030. Recently, a dataset of car accidents has been published by the Ministry of Transportation. The dataset is very hard to find as it contains the raw records of car accidents (not the aggregate counts). In this blog post, I will use this dataset to conduct an exploratory data analysis on road accidents in Saudi Arabia to answer some basic questions that I have always wanted to ask:

  • How are car accidents related to the quality of the roads?
  • What are the most frequent accident types? Which region has a higher proportion of car accidents?
  • Are some areas (e.g., travel roads) different from urban areas in the patterns of car accidents?

Among many others.

Although not all the questions can be answered solely through data exploration, exploratory data analysis gives us the right tools to know how to approach a problem.

About the Data

The data comes from a challenge organized by Thakaa Center called (The Road Safety Challenge). The data have 36K raw records of road accidents across the kingdom. The data contains many attributes about every accident such as date and time, region, road number, road type, number of deaths, number of injuries, the geometric road type, latitude/longitude, weather status, road status, across many other variables (that you can explore here and here).

The only concern I had about the data itself was its accuracy. Although the Ministry of Transportation sponsored the Road Safety Challenge, the numbers seem to diverge from those from the Ministry of Interior and the General Authority of Statistics (source 1, source 2) where they report a total of 352,464 accidents in 2019 alone (compare that with 14,842 accidents in the dataset).

I do not have any explanation of those diverging numbers, but I would consider this dataset as a sample of all car accidents that took place and I would hope that someday we can have access to the full dataset of accidents.

Deaths and Injuries

At first look, there are about 68% of accidents that reported the number of deaths while 81% of accidents reported injuries (including 0’s). The remaining accidents are missing this key information. Of those valid accidents, about 8.7% of the accidents resulted in at least one fatality, while 48.8% of accidents resulted in at least one injury.

Accident Types and Regions

“Region” is one of the important independent variables we have in this dataset because Saudi Arabia is a big country with an underappreciated diversity between regions both in geography and population. The only problem we’ll need to fix is to scale the numbers within each region as central regions have more accidents due to the higher population. After scaling (using a z-score method), we can see the overall trends between regions and make a few observations.

Looking at accident types, we see that crash is the most common accident type (44%) followed by a coup (24%) after which the percentage falls dramatically to 8% for deflection.

While looking at different accident types is informative, it is most informative if we look at how it also relates to different regions. After scaling the number of accidents in each region across types (to remove the effect of the baseline number of accidents), we can notice the following:

Left: relative frequency of accident types.
Right: the normalized counts of each accident type within each region with a zero mean and a unit standard deviation.

Crash and coup still account for most accidents in all regions. However, we see that some regions have more unique profiles. For example, ‘overdrive’ is most reported in the Southern regions (Assir, Baha and Najran) and Mecca. ‘Deflection’ also is common in Qassim, Mecca, Jawf, Hail, Eastern Providence, and Baha. Similarly, ‘Crush from behind’ is more common in Eastern Providence, Hail, Madina, and Qassim.

Overall, it looks like that Baha, Assir, and Mecca (all of which have mountains), share more of ‘overdrive’ and ‘deflection’ accidents, while Madina, Eastern Providence, Qassim, and Hail share more of ‘crush from behind’ and ‘deflection’ accidents. Aside from that, profiles of both Riyadh and Jazan are very similar, where they share ‘crash’ accidents more than any other accidents.

To me, the story here is about the geography: cities that have mountain roads tend to have a very unique profile that is opposite from cities on flat geography with major highways.

Temporal Patterns

Temporal trends: monthly counts of car accidents (left), number of deaths (center) and number of injuries (right).

The first idea would be to look at the overall temporal distribution of those numbers: how do they change with time? When we look at the monthly trends (over two years: 2017 and 2018), we see two peaks in both the number of accidents and the total number of deaths: one around the summer (May-June-July) and the other around the end of the year (December-January).

Temporal trends: hourly counts of the number of car accidents (left), the number of deaths (center) and the number of injuries (right).

If we look at the hourly trends, we also notice a peak in common hours that starts going upward starting from 7 AM. The number of accidents peaks once at 8 AM while the number of deaths peaks twice: one at 8 AM and the other at 8 PM. The number of injuries also show similar peaks.

We already know that accidents, on average, show a temporal pattern. This time, we’ll look at the temporal patterns of each accident type and also of each region. The hope is that we can connect the different pieces into a more coherent understanding of the big picture. Here, we’ll use the same type of scaling but we’ll scale either within-accident-type or within-region.

Left: normalized counts of accident types across hours.
Right: normalized counts of accident numbers across hours.

The graph on the left tells us that while most accidents peak during the morning-afternoon period (from 8 AM-5 PM), some peak at different times. For example, ‘coup and crash’, ‘crush from behind’ and ‘crush with a stationary body in the road’ also peak after midnight. Some accidents only occur at night between 8 PM – 1 AM such as ‘tread animal’ and ‘tread man’.

The one on the right shows us that when we look at regions instead, we first see that most accidents occur during the common hours from 7 AM up to 3 PM. Although in Eastern Providence and Riyadh it starts from 6 AM. We also notice that some regions have more of night accidents than others. Tabuk, Madina, Northern borders and Najran all have a higher share of night accidents, as opposed to Riyadh where most of its accidents are during the day.

Those “two peaks” patterns in both analyses make me wonder if the accidents at night in those cities are similar to the types of accidents that occur at night (like ‘crush from behind’ and ‘tread man’ or ‘tread animal’). Those types of accidents make it clear that we should look further into how road types now interact with those observations.

What changed from 2017 to 2018?

Raw differences in the number of accidents between 2017 and 2018. Red indicates higher number in 2018.

Although the overall number of accidents has decreased from 2017 to 2018, some accident types have actually increased in some cities. In the following heat plot, color red indicates an increase between 2017 and 2018 while the color blue indicates a decrease. The intensity of the color should indicate the magnitude of this difference. We generally see that the decrease in the number of accidents has been due to the decrease in the accidents of type ‘crash’ that represents 44% of all observations. However, we see increases: ‘crush from behind’ and ‘deflection’ have increased in many cities as well as ‘overdrive’ in Najran.

Mapping Accidents

A density map of the number of injuries by geometric types. The number of accidents is shown next to the title.

We are all here for the density maps. Those maps show the density of records in a given coordinate. I have 3 maps to show: the first map shows the density of accidents by different geometric road types. The red color indicates higher counts (and potentially more deadly points). The second map does the same thing but with the road type, and the final map shows the weather status.

Looking at the first map, we can notice a few things: injuries in U-Turns are scattered throughout the kingdom but we see dense clusters in Jazan and Tabuk. Also, accidents in Straight Link roads and 3-Leg intersections are scattered throughout the kingdom. The accidents in 4-Leg intersection and interchange are also clustered in both Qassim and Riyadh. Accidents in horizontal curve roads are also clustered in the southern regions.

A density map of the number of injuries by accident types. The number of accidents is shown next to the title.

When we look at the density map of accident types, we can see a few more observations. Coup, the explosion of a car tire and deflection accidents are common across the main highways of the kingdom. Those are the “highway accidents”. Compare that with overdrive which is most common in the Southern regions, Mecca (and the highway linking both), and Riyadh. Also, collateral crush is most common within cities. Coup and crush look more common in Madina and Madina-Mecca highway. Finally, tread animal is very scattered throughout the kingdom (and non-highway roads).

A density map of the number of injuries by the weather status. The number of accidents is shown next to the title.

Finally, when we look at the density map of weather status, we can see that we are missing this variable from about half of the accidents. With the valid half, the majority of accidents occurred in “Good” weather status. Besides, Riyadh-Dammam and Riyadh-Qassim highways report more injuries in dusty weather.

Conclusion

There were lots of ways to look at this dataset. The exploratory data analysis process is, by nature, a bit opinionated as different analysts will look from different angles.

I hope that my analysis will inspire many more explorations to pinpoint the major causes of car accidents and hopefully make them go away.

Note on Data Availability

The data were obtained as a part of the Road Safety Challenge organized by Thakaa Center. While downloading the data, I passively agreed that I won’t distribute the dataset with anyone else (despite its availability in a public URL). I choose to err on the side of caution and refrain from distributing the link. If you are really interested in obtaining the dataset, reach out to Thakaa Center at Twitter.

Jupyter Notebook

A Jupyter notebook that was used to analyze those trends (and contains much more analyses and interactive maps) will be available here soon.

Update 1:

I was informed that this dataset comes from the Ministry of Transportation and it only reports the inter-city accidents, not the intra-city accidents. This explains the discrepancy between the number of accidents here and the other numbers.

1 thought on “Exploratory Data Analysis of Road Accidents in Saudi Arabia”

  1. For the accuracy you mentioned I think the data you received is for intercity highways not including intercity accidents
    Because the source of the data is Mot

Leave a Reply

Your email address will not be published. Required fields are marked *