Key findings
  • Generalised linear models can be used to describe spatial differences in crash frequency of vulnerable road users on the district level.

  • Studies that only use the differences in population size often overestimate the safety of vulnerable road users in large cities.

  • Crash frequencies on district level are significantly influenced by precipitation, modal share, age distributions, land use, and tourism.

Introduction

Approximately 110,000 people are killed or severely injured in urban areas in the European Union (EU) each year, and the majority (70%) are vulnerable road users (VRU). However, the progress of reducing crashes involving VRU has slowed over the past decade and continues to stagnate (ETSC, 2019). Directive 2008/96/EC of the European Parliament called for action to enhance road safety in Europe via safety management systems for road infrastructure. Among others, suitable methods were to be developed for the safety classification and the safety management of the road network in operation (European Parliament, 2008) with the purpose of identifying crash-conspicuous sections of road networks as a basis for determining and prioritising necessary new construction and reconstruction needs. Current methods indirectly focus more on predicting motorised traffic crashes on the classified road network (European Commission, 2023a) than crashes involving non-motorised road users. In the example case of Germany, half of the crashes with personal injuries (53%) and most of the crashes with fatalities (77%) occurred on the classified road network. However, this focus on the classified road network indirectly ignores two thirds of VRU fatality or injury crashes as these occur in urban areas and on the unclassified road network (Statistisches Bundesamt, 2015, 2024).

In urban areas, road safety management is mostly based on blackspot management. Conspicuous areas are identified by analysing the number of crashes or number of crashes weighted by their crash severity. If a threshold value of (weighted) crashes is exceeded within a specified radius or section length, this location is identified as a crash blackspot that requires further investigation (FGSV, 2012; Forschungsgesellschaft Straße, 2004). However, again with respect to VRU, due to the high number of cars involved in crashes, locations of hotspots are strongly influenced by crashes involving motorised vehicles. In Germany, crashes in hotspots include 26-50 percent of crashes of motorised vehicles with personal injuries, but only 12-26 percent of VRU crashes.

Although these numbers are based on German crash statistics, they represent general European trends. The number of VRU fatality crashes is increasing after the COVID dip of 2020 (+9% cyclists; +4% pedestrians) and especially for cyclists over the last decade (+5%) (CARE Database, 2023). Additionally, cycling and walking are increasing in popularity which can be seen in higher traffic volumes of cyclists (Eco counter, 2022) and travelled kilometres by pedestrians (European Commission, 2023b) compared to a decade ago and shows the timely need for a greater focus on VRU to achieve road safety targets in the EU.

However, the attention and awareness of VRU crashes varies between cities and districts throughout Europe (ETSC, 2019). As more crashes are to be expected in larger cities with more inhabitants, the crash load, which references the number of crashes to the number of inhabitants in a city, has become a common comparison measure (Smeed, 1949). For instance, Figure 1 shows the regional distribution of crash loads for crashes involving pedestrians and cyclists in German administrative districts. Districts with higher deviations from mean values of crash loads are highlighted in darker shades, while regions for which this information are not available are displayed with striped patterns. It becomes noticeable that districts with higher-than-average crash loads are differently distributed between bicycle and pedestrian crashes. Higher pedestrian crash loads can be found in larger cities, while higher bicycle crash loads are found in larger cities as well as in remote areas.

Figure 1
Figure 1.Distribution of crash loads for cyclist (left) and pedestrian (right) crashes for each of the administrative districts of Germany (by mean numbers of crashes and inhabitants between 2022-2023)

Various influences on the occurrence of VRU crashes have been identified in literature. Inhabitant-related influences are the most frequent factors in VRU crashes. In models, inhabitant-related influences are mostly represented by variables such as their number, density or age structure. By default, many approaches use a linear relationship between crash frequency and number of inhabitants (Eksler et al., 2008). However, studies not using a preliminary definition of the relationship have reported a degressive relationship between both (Noland & Oh, 2004). A higher density of population further increases crash numbers (Berger et al., 2012). Even though age structure is frequently identified as an influencing factor, different effects on crash frequencies are reported with elderly or younger demographics (compare Aguero-Valverde & Jovanis, 2006; Jamali-Dolatabad et al., 2019 to Law et al., 2011; Quddus, 2008).

Traffic-related influences mostly report motorisation as having a degressively decreasing correlation with VRU crashes (Grimm & Treibich, 2013). Further influences include network length (Lee et al., 2015) and traffic volume, which both increase the number of crashes. However, for traffic volume, the identified links to crash frequency were on a progressive (Quddus, 2008), linear (Eksler et al., 2008), or degressive basis (Lee et al., 2015).

Economic factors further influence VRU crashes on a macro-scale. Studies have shown different cause-effect relationships between income and crash frequencies which may be due to differences in research focus. A worldwide study identified crash frequency to initially rise to a maximum with increasing income and then decrease as income continues to rise (Law et al., 2011). Higher crash frequencies were also reported in districts with high employment rates (Berger et al., 2012; Lee et al., 2015).

Finally, environmental factors also have been shown to influence VRU crash frequencies. Urbanisation is a significant influence contributing to more crashes, especially in industrialised countries (Law et al., 2011). Areas with a high industrial, commercial, or mixed land use show higher crash frequencies (Song et al., 2021), as well as areas with a high number of hotels, motels, or guesthouses (Siddiqui et al., 2012). Furthermore, precipitation is identified as a climatic influence leading to more VRU crashes (Berger et al., 2012).

The aim of this study was to develop one model and determine and quantify the systematic effects of known factors in relation to the number of VRU crashes using the example of administrative districts in Germany. The model will create a baseline that identifies the cities and districts that need closer assessment of VRU crash numbers for targeted approaches to traffic safety initiatives and activities.

Method

To quantify the effect of spatial differences on VRU crashes, appropriate models that allowed the estimation of multivariate effects on crash frequency were required. The model choice and its characteristics as well as the used data during model creation are presented below.

Model choice

The statistical models needed to estimate the frequency of VRU crashes in German districts on the basis of traffic, topographical, economic, social and demographic factors at district level.

Generalised linear models (GLM) are frequently used to describe multivariate effects on crash frequencies (Bauer & Harwood, 1998; Chayanan et al., 2003; El-Basyouny & Sayed, 2010; Maher & Summersgill, 1996) and intersections (Jonsson et al., 2009; Stijn et al., 2009) in which Poisson or negative-binomial distribution of crash frequencies are often used. Cullen-Frey-graphs (1999) and histograms of the distribution of crash frequencies in German districts have shown that a negative-binomial distribution also holds true on the macroscopic level for both pedestrian and cyclist crashes (Figure 2, Figure 3).

Figure 2
Figure 2.Distribution of pedestrian (left) and cycling crash frequencies (right) in German districts

Source: Statistisches Bundesamt (2024)

Figure 3
Figure 3.Cullen-Frey graph of kurtosis and skewness of pedestrian (left) and cyclist crash frequencies (right) in German districts

Within GLMs, two types of influencing factors, exposure and risk factors, can be distinguished. Exposure factors are inputted as their logarithm and due to the link function are represented as power function in the model. Risk factors, in comparison, are represented as an exponential function for which they only have an influence on crash frequency if their value deviates from zero (Maher & Summersgill, 1996). Equation 1 (below) shows the standard structure of GLMs using a natural logarithm as link function and the resulting multiplicative link of influencing factors as exposition factor (EF) in the form of power functions and risk factors (RF) as exponential function to calculate crash frequencies (C) (Goldburd et al., 2025).

\[C = b_{0} \cdot \prod_{i} {EF}^{b_{i}} \cdot \prod_{j} e^{b_{j}\ \cdot \ RF} \tag{1}\]

Data

Based on the influences identified in the literature, the authors searched for corresponding data on the macro-scale in Germany. German administrative districts allow for a common basis for data aggregation of different data sources and data types. Regions are divided into territorial units for statistics (NUTS), which is a geographical system dividing the territory of the European Union into hierarchical levels. The data research showed that the best balance between regional detailed data and available data amount is for NUTS-3 regions. NUTS-3 regions generally have a population of 150,000 to 800,000 inhabitants. On this basis, the German territory is divided into 401 statistical districts, which were used as analysis units in the models (Figure 1 above).

Inhabitant-related data, including the total number of inhabitants and data on gender or age, were from the regional database from the German Federal Statistical Office (Statistische Ämter des Bundes und der Länder, 2024) the Federal Institute for Building, Urban Affairs and Spatial Research (Bundesinstitut für Bau-, Stadt und Raumforschung, 2025), and the German Federal Criminal Office (Bundeskriminalamt, 2024). Climate data from the German Meteorological Service are available in the form of gridded data or point data from weather stations (Deutscher Wetterdienst, 2025) and needed to be spatially aggregated to describe sunshine duration, precipitation or temperature on the district level. Topographical as well as traffic-related data were derived from the gridded data of the Monitor of Settlement and Open Space Development (IOER Monitor) from the Leibniz Institute of Ecological Urban and Regional Development (Leibniz-Institut für ökologische Raumentwicklung, 2025). To describe differences in traffic demand, origin-destination trip matrices by road user types from the Federal Ministry for Digital and Transport were used (Schubert et al., 2015). Finally, crash data were obtained from the annual periodical reports on traffic crashes of each Federal State of Germany (Statistische Ämter des Bundes und der Länder, 2024). The full set of used potential influencing factors is included in Table 1. The potential influencing factors were then used to create the models and the resulting pedestrian and cyclist models are presented below.

Table 1.descriptive statistics of variables on the basis of statistical districts of Germany
Variable Description Source* Time Unit Median Mean SE Min Max
AREA Area size 1 1 km² 797.42 891.21 723.51 35.70 5.480.40
AREA_COMM_IND Industrial or commercial area 1 1 ha 1,252.00 1,511.48 1,010.29 210.00 6,823.00
AREA_MIXED_USE Mixed used area 1 1 km² 8.35 10.76 9.50 0.25 56.72
AREA_SQUARES Area of squares 1 1 km² 0.72 0.86 0.58 0.04 5,08
AREA_SETTLE Settlement area 1 1 ha 6,728.00 8,172.09 5,503.28 1,050.00 49,116.00
AREA_RESIDENT Residential building area 1 1 ha 2,786.00 3,408.63 2,342.93 419.00 21,722.00
AREA_TRAFFIC Area with road traffic use 1 1 Ha 2.123.00 2,363.91 1,533.78 235.00 10,262.00
AREA_TRAFFIC_SHARE Share of area with road traffic use 1 1 % 2.90 3.60 2.09 0.70 11.00
AREA_WALKING Walkable area Derived from 1 1 km² 15.03 16.49 13.44 0.45 69.11
AREA_UNSETTLED_SHARE Share of area with road traffic use on areas size Derived from 1 1 % 87.40 80.28 15.50 24.60 95.20
AREA_COMM_IND_SHARE Share of industrial or commercial area on area size Derived from 1 1 ha/km² 1.68 3.14 3.09 0.38 16.31
AREA_SETTLE_SHARE Share of settlement area on area size Derived from 1 1 ha/km² 9.49 15.08 12.41 3.36 57.76
TEMP_AV_ANN Ann. av. Temperature Derived from 2 1 °C 9.77 9.76 0.88 6.29 11.66
PREC_TOT Ann. av. sum precipitation Derived from 2 1 mm 817.86 854.89 206.27 508.46 2,093.44
PREC_D_>10 Ann. av. days with prec. >=10 mm Derived from 2 1 [-] 22.85 24.49 8.25 9.61 67.06
PREC_D_>20 Ann. av. days with prec. >=20 mm Derived from 2 1 [-] 5.26 5.76 2.93 1.33 23.52
PREC_D_>30 Ann. av. days with prec. >=30 mm Derived from 2 1 [-] 1.03 1.51 1.48 0.00 11.26
SNOW_D Ann. av. days with snow cover Derived from 2 1 [-] 28.24 31.30 20.94 2.24 127.09
SUMMER_D Ann. av. days with air temperature >= 25°C Derived from 2 1 [-] 44.47 42.61 14.51 4.54 72.44
SUMMER_D_SHARE Ann. av. share of summer days Derived from 2 1 [-] 0.12 0.12 0.04 0.01 0.20
SUN_H_MEAN_ANN Annual av. sum of sunshine hours Derived from 2 1 h 1.596.60 1,615.22 151.07 1,335.04 1,939.81
POP_TOT Total population 3 2 [1,000] 150.36 203.09 234.25 34.17 3,472.35
RATIO_YOUTH Youth ratio 3 2 [-] 30.86 30.20 3.17 21.88 39.50
RATIO_ELDERLY Elderly ratio 3 2 [-] 35.50 35.84 4.90 24.00 51.88
POP_M Total male population 3 2 [1,000] 73.79 99.75 114.41 16.79 1,698.54
POP_F Total female population 3 2 [1,000] 75.39 103.34 119.85 17.38 1,773.81
POP_M_SHARE Share male population 3 2 [-] 0.49 0.49 0.01 0.47 0.51
POP_FOREIGN Total foreign population 1 1 [1,000] 12.61 24.07 45.53 0.00 637.75
POP_FOREIGN_SHARE Share foreign population Derived from 1 1 % 9.17 10.01 5.17 0.00 35.00
POP_DENSITY Population density 3 2 Inh./km² 196.90 522.69 684.54 36.45 4,596.28
HOMES Number of homes, flats 1 3 [1,000] 75.17 103.41 125.99 18.33 1,905.29
EMPLOYED_TOT Number of employees 1 1 1,000,000 0.05 0.08 0.10 0.01 1.25
EMPLOYED_FOREIGN Number of foreign employees 1 1 [1,000] 3.02 6.19 12.34 0.24 138.22
UNEMPLOYED Number of unemployed 1 1 [1,000] 4.18 6.61 11.18 0.82 180.80
UNEMPLOYED_RATE Unemployment rate 1 1 % 5.34 5.66 2.57 1.38 14.26
MOTORISATION Motorisation 5 1 cars/1,000 inh. 584.14 570.27 67.54 336.56 1,125.02
TOT_COMM_IN Commuters into area 6 4 [1,000] 15.92 26.07 34.76 2.76 338.71
TOT_COMM_OUT Commuters out of area 6 4 [1,000] 18.91 23.30 17.84 2.14 124.61
TOT_COMM_NET Net commuters 6 4 [1,000] -6.18 0.24 24.64 -51.17 208.44
MS_TRAIN Modal split train traffic (by ways) Derived from 6 4 [-] 0.01 0.01 0.01 0.00 0.12
MS_MOTOR Modal split mot. private traffic (by ways) Derived from 6 4 [-] 0.62 0.58 0.10 0.32 0.74
MS_AIR Modal split air traffic (by ways) Derived from 6 4 [-] 0.00 0.00 0.00 0.00 0.00
MS_PUBLIC Modal split public transport (by ways) Derived from 6 4 [-] 0.05 0.07 0.05 0.02 0.29
MS_CYLCE Modal split bicycle traffic (by ways) Derived from 6 4 [-] 0.08 0.09 0.05 0.03 0.35
MS_PED Modal split pedestrian traffic (by ways) Derived from 6 4 [-] 0.24 0.24 0.04 0.12 0.37
ACCESS_REGION Av. travel time by car to regional centres 3 2 min 9.00 7.93 6.38 - 45.00
RELIEF_DIV Relief diversity 4 5 [-] 1.01 1.01 0.02 0.99 1.15
RELIEF_ENERGY Relief energy 4 5 1,000 m 0.28 0.36 0.32 0.02 2.36
OLD_NEW In former West-Germany (1=yes, 0=no) - 5 [-] 1.00 0.81 0.39 - 1.00
TOURIST_STAY Av. ann. number tourist stays 1 1 [1,000] 593.97 1,070.51 2,041.96 43.84 29,619.74
TOURIST_ARR Ann. number tourist arrivals 1 1 1,000,000 0.23 0.41 0.86 0.03 12.25
TOURIST_RATIO Ratio of annual arriving tourists/inhabitant Derived from 1 1 [-] 1.39 1.95 1.61 0.24 11.35
CRIMES Number of crimes 7 1 [1,000] 8.27 14.15 31.29 1.69 520.44
CRIMES_SHARE Share of crimes/inhabitant Derived from 7 1 [-] 0.05 0.06 0.03 0.02 0.15
HOSPITAL_BEDS Number of hospital beds 1 1 [1,000] 0.15 0.41 0.63 - 3.59
C_CYCLE Av. ann. number of cyclists’ crashes with personal injuries or fatalities Gathered from State offices of statistics of all German Federal States 1 [-] 115.00 194.14 375.52 7.60 5,705.60
C_PED Av. ann. number of pedestrian crashes with personal injuries or fatalities Gathered from State offices of statistics of all German Federal States 1 [-] 43.80 79.21 165.92 12.20 2,663.60

Results

Cyclist Model

The number of inhabitants (POP_TOT) explains most of the variance in the cyclist model and is almost linear. However, the modal split of bicycle traffic (MS_CYCLIST) is another exposure variable included in the model and increases cyclist crashes degressively with a coefficient of 0.887. This means that the number of cyclist crashes increased to a lesser extent than the increase in the modal split.

In the full model, the highest effect of a risk variable was precipitation with the annual average number of rainy days of more than 30 mm (PREC_D_>30), which had a significant increase in the number of cyclists involved in crashes. There was a seasonal influence as the number of crashes involving cyclists increased as the average proportion of summer days (SUMMER_D_SHARE) rises. Presumably this was due to the temporary modal shift in favour of cycling on warm days. Two additional summer days per year above the average lead to 1.5 percent more cyclists involved in crashes (e (2.770∙2/365)). The number of tourists per inhabitant (TOURIST_RATIO) clearly showed that considering only inhabitants in the form of a crash load measure can be misleading: if two districts have almost similar structure, but the tourist ratio in one district is 0.5 percent higher than the other, 2.4 percent more cyclists were involved in crashes in the district with the higher tourist ratio (e (0.048∙0.5)).

In contrast, the share of walkable area (AREA_WALKING) and the modal split of pedestrian traffic (MS_PED) had a crash-reducing effect.

The model also shows a reduced number of crashes involving cyclists in the districts located within the former West German (old) federal states (OLD_NEW). However, this factor is assumed to stand as proxy for multiple effects influencing the exposure or risk of cyclists in the new federal states (new). Table 2 summarises the model creation steps, influencing factors and their factor type (exposition or risk factor) and various model accuracy measures of the cyclists’ crash frequency model.

Table 2.Model results for influencing factors on the frequency of cyclists being involved in crashes with personal injuries
Variable Coefficient Wald-Test SE CI (95%) P-⁠value (LLR) VIF
Type Label Lower Upper
Zero-model
Intercept 5.269 <0.001 0.048 5.172 5.365 <0.001 -
Only population model
Intercept -7.308 <0.001 0.502 -8.311 -6.305 <0.001 -
Exposition Factor Log (POP_TOT) 1.022 <0.001 0.042 0.939 1.106 <0.001 -
Exposition model
Intercept -5.131 <0.001 0.381 -5.893 -4.370 <0.001 -
Exposition Factor Log (POP_TOT) 1.021 <0.001 0.030 0.961 1.081 <0.001 1.001
Log (MS_CYCLIST) 0.887 <0.001 0.045 0.798 0.997 <0.001 1.001
Full model
Intercept -5.303 <0.001 0.931 -6.086 -4.520 <0.001 -
Exposition Factor Log (POP_TOT) 1.053 <0.001 0.026 1.001 1.106 <0.001 1.099
Log (MS_CYCLIST) 0.957 <0.001 0.046 0.865 1.048 <0.001 1.478
Risk Factor PREC_D_>30 0.054 <0.001 0.013 0.027 0.080 <0.001 1.465
SUMMER_D_SHARE 2.770 <0.001 0.533 1.703 3.837 <0.001 1.401
TOURIST_RATIO 0.048 <0.001 0.011 0.026 0.071 <0.001 1.200
MS_PED -1.750 <0.001 0.441 -2.632 -0.869 <0.01 1.186
AREA_WALKING -0.004 <0.01 0.001 -0.007 -0.001 <0.01 1.425
OLD_NEW -0.111 <0.05 0.045 -0.201 -0.022 <0.01 1.131

Cumulated residuals assessed model accuracy as well as systematic deviance in a model and were used to show the model enhancement by including further influencing factors. Since the number of inhabitants explains most of the variance in the cyclist model, the cumulated residuals over population size between the Only population model and the Full model were compared. Although both models tend to underestimate the number of cyclist crashes in districts between 50,000 and 250,000 inhabitants, Figure 4 clearly shows that the Full model has a lower degree of systematic deviance than the Only population model.

Figure 4
Figure 4.Comparison of cumulated residuals between Full model (solid line) and Only population model (dashed line) of cyclist crashes

Pedestrian Model

The pedestrian model had many similarities with the cyclist model with some variation. The pedestrian exposure model showed degressive influence for the modal split of pedestrians (MS_PED) on the number of involved pedestrians in crashes. In contrast to the cyclist model, the coefficient for the population (POP_TOT) was below 1 resulting in a degressive influence of population rather than a linear increase of pedestrian crashes with increase of population in a statistical district.

The number of pedestrians involved in crashes also increases with an increase in the average number of light rainy days per year (PREC_D_>10). Although the annual average number of rainy days with >30 mm also had a significant impact, the annual average number of rainy days with >10 mm was implemented in the pedestrian model due to higher model improvement.

Further, substantially higher numbers of crashes involving pedestrians were associated with an increased proportion of tourists in relation to the population (TOURIST_RATIO). Additionally, a higher ratio of elderly people (RATIO_ELDERLY) in a district increased the average number of involved pedestrians in crashes. This may be due to older pedestrians having a higher likelihood of an injury that is recorded in police or hospital statistics. Two further new risk variables for districts were identified: the share of unsettled area (AREA_UNSETTLED_SHARE) and areas of spaces, squares and courts (AREA_SQUARES). The model shows that a high proportion of unsettled areas with no residential, industrial, commercial, leisure, or recreational use in the districts lead to a decrease in the number of pedestrians involved in crashes, whereas the share of spaces, squares and courts cause an increase in pedestrian crashes. Table 3 summarises the model creation steps, influencing factors and their types as well as various model accuracy measures of the pedestrians’ crash frequency model.

Table 3.Model results for influencing factors on the frequency of pedestrians being involved in crashes with personal injuries
Variable Coefficient Wald-Test SE CI (95%) P-⁠value (LLR) VIF
Type Label Lower Upper
Zero-model
Intercept 4.372 <0.001 0.046 4.280 4.464 <0.001 -
Only population model
Intercept -8.773 <0.001 0.338 -9.449 -8.097 <0.001 -
Exposition factor Log(POP_TOT) 1.067 <0.001 0.028 1.010 1.123 <0.001 -
Exposition model
Intercept -8.142 <0.001 0.326 -8.795 -7.489 <0.001 -
Exposition factor Log(POP_TOT) 1.116 <0.001 0.026 1.064 1.169 <0.001 1.027
Log(MS_PED) 0.864 <0.001 0.103 0.658 1.069 <0.001 1.027
Full model
Intercept -5.398 <0.001 0.343 -6.084 -4.712 <0.001 -
Exposition factor Log(POP_TOT) 0.917 <0.001 0.026 0.865 0.958 <0.001 2.631
Log(MS_PED) 0.394 <0.001 0.001 0.262 0.526 <0.001 1.147
Risk factor AREA_UNSETTLED_SHARE -0.018 <0.001 0.066 -0.020 -0.017 <0.001 1.403
TOURIST_RATIO 0.025 <0.001 0.007 0.010 0.040 <0.001 1.234
AREA_SQUARES 0.082 <0.01 0.025 0.032 0.133 <0.001 2.265
RATIO_ELDERLY 0.006 <0.01 0.002 0.002 0.011 <0.01 1.160
PREC_D_>10 0.003 <0.10 0.001 0.000 0.006 <0.05 1.218

In analogy to the cyclist model, Figure 5 shows the cumulated residuals of the Only population model and the Full model over the population size of a district. Again, similar trends concerning the underestimation of the models are in both models with the Full model having a lower degree of systematic deviance than the Only population model.

Figure 5
Figure 5.Comparison of cumulated residuals between Full model (solid line) and Only population model (dashed line) of pedestrian crashes

Discussion

In this study, a literature analysis was performed of known influences on the crash frequency of pedestrians and cyclists to define potential influencing factors on the macroscopic level of statistical districts. Sources for these potential factors were researched and, if necessary, broken down to the level of the defined statistical districts via GIS analysis. Based on German statistical districts, we used a variety of data on climate, topography, demography, traffic, traffic infrastructure and economics to examine factors influencing crash frequency of these transport modes and possible causes for their spatial differences.

It was verified that generalised linear models are an appropriate approach for modelling crash frequencies on the level of larger regional areas via Cullen-Frey graphs. Crash prediction models were created for crashes with personal injuries involving cyclists and pedestrians. In each step, multicollinearity of included variables, significance of coefficients and model improvement were assessed, and an outlier analysis was performed in the final models.

Given the strong influence of the number of inhabitants and the nearly linear increase of cyclist and pedestrian crashes with increasing population, the comparison of districts by their crash load (crashes per inhabitant) is initially justified. In contrast to other studies on spatial differences in crash frequencies (e.g. Aguero-Valverde & Jovanis, 2006; Berger et al., 2012), this study controlled for differences in modal split of pedestrians and cyclists in districts. As expected, crash frequencies of pedestrians and cyclists increase with their modal share. However, their increase has a rather degressive incline with their modal share, which may be due to larger districts having a higher level of safe infrastructure or that a safety in numbers effect not only exists on the microscopic scale of a crossing but also on the macro-scale of a district.

In line with other findings (Noland & Oh, 2004), both models show a degressive relationship with the modal split of pedestrians. Especially in larger cities (which on average have a higher modal split of pedestrians) this would lead to inaccuracies if the safety assessment would only regard crash load measures.

The tourist ratio, which describes the number of guest arrivals in relation to the number of inhabitants, showed the highest effects in improving the models after the exposure factors. The effect was stronger for cyclist crashes compared to pedestrian crashes. Interestingly, this factor has not been examined in previous studies and thus should be considered especially in districts with high tourist appeal.

In the light of systematic deviations, the plots on cumulated residuals for both models showed that they tend to underestimate the number of crashes especially for smaller districts with inhabitants between 50,000 and 250,000 with a lower average deviation than models that only considered population as influencing factor. It is likely that a critical review of factors used in safety assessments is needed to identify additional underlying effects that need to be modelled to further examine cyclist and pedestrian crashes.

Focusing on the cyclist model, as a specific German aspect, districts in the new federal states on average have a 4.5 percent higher crash frequency of cyclists compared to similar districts in the former West German federal states. However, this Boolean variable describing whether a district is located within the former part of West Germany or not likely stands as proxy for broader, underlying structural differences that influence the exposure of cyclists but are not available on the level of statistical districts of Germany. First, men statistically more often ride a bike and due to migration movements after the reunification of Germany, proportionally more men live in the new federal states (Statistisches Bundesamt, 2025). Second, the modal split of cyclists is higher in the new federal states (Schubert et al., 2015). Third, there is a higher share of cyclist tourism in the new federal states and, fourth, the share of facilities dedicated to cyclists is lower in the new federal states, which forces cyclists to travel alongside motorised traffic more often (Bundesministerium für Digitales und Verkehr (BMDV), 2024).

Furthermore, the share of walkable areas as well as the modal split of pedestrians show a crash-reducing effect in the cyclist model. One possible interpretation is that an increase in both variables characterises an increase in the use of pavements by non-motorised traffic. Cyclists, therefore, may benefit from an increased attention of the motorised traffic towards pedestrians moving on side spaces.

The pedestrian model shows, on average, a decrease in crashes if a district has a higher share of two spatial factors: unsettled areas and spaces, squares and courts. However, this may also stand as proxy for a general decrease in walking distances in more rural districts (infas, DLR, IVT, infas 360, 2019) as well as specifying metropolitan regions with a higher level of safe infrastructure.

In summary, both models describe a baseline of cyclist-involved and pedestrian-involved crashes to be expected in German districts. Thereby, the models include cause-effect-related variables as well as proxy variables. On one hand, the proxy variables represent further underlying effects for which data is not available on the level of districts while on the other hand, they represent structural differences inherent to Germany, which may not be similar in other countries. With a critical review of results in districts with low population, the deviations between true crash frequencies and model results could help identify districts and cities which have a higher potential to reduce VRU crashes given their structural, environmental, demographic and traffic conditions. Since the models account for the true relationship between crash frequencies and exposure factors, as well as further influencing factors, the models would ensure a more targeted approach for initiating traffic safety activities. Especially within long-term action plans for traffic safety, the needs of different risk groups can be shown more clearly and ranked appropriately. Analogously, these models can be created on a city district level and allow a city to more accurately plan and implement countermeasures in city districts with a higher potential for safety enhancement.

Conclusions

In this study, GLMs were used to identify and quantify systematic effects of spatial differences in climate, topography, traffic or economy on VRU crashes for German statistical districts. These multivariate models complement the current, mostly univariate findings on the effect of inhabitants or income on the macro-scale by effects of further influencing factors. They allow us to more closely regard the previously assumed linear relationship between inhabitants and crash frequency which initially seems to be justified. However, especially in the case of pedestrian crashes and in larger districts, the models show that this linear link needs to be regarded more closely to reach a robust assessment.

Specific attention should also be paid to the role of modal split of the respective modes of transport. Both cyclist and pedestrian crashes have a degressive relationship with their corresponding modal split, which would especially lead to a false interpretation of a bigger district’s safety if the influence was disregarded entirely or assumed to be linear.

The change in VRU crash numbers is also highly influenced by additional visiting tourists and is often not considered. The effect was stronger for cyclist than for pedestrian crashes.

Finally, large discrepancies can exist in an assessment of VRU crashes if the analysis is based solely on crash loads and disregarding the impact of demographic, structural and climate differences. Failing to consider underlying macro effects may lead to a focus on regions, districts or cities, in which crash levels are close to expected levels and resulting in an uneconomical distribution of sources for enhancing traffic safety of cyclists and pedestrians. The focus on districts with the highest crash prevention potential also effectively reduces associated trauma and socio-economic costs.


AI tools

AI tools were not used in this study nor in the preparation of this paper.

Author contributions

Maria Pohle: drafted the manuscript, contribution to the conception, design, execution and interpretation Walter Niewöhner: contribution to the conception and interpretation, final approval of the version to be published Colin A. Booth: contribution to the conception, critically revised manuscript for intellectual contents, final approval of the version to be published.

Funding

This study was funded by the German Federal Ministry for Digital and Transport (funding code: VB18F1009A).

Data availability statement

Data used for the model development are cited within the text as reference. Except from climate data, which was assigned by GIS analysis via their spatial extent, all data could be merged by their district keys. With the exception of modal split data, which is derived from the German forecast study of transport traffic in Germany in 2025, all data is open access.

Conflicts of interest

The authors declare that there are no conflicts of interest.