Matt N
CF Legend
Disclaimer: This post is extremely long, and if you don't like data analysis and geeky maths talk, I'd suggest you run for the hills and don't look back, because this post has quite a bit of it!
Hi guys. There are a lot of theme parks in Europe, as well as a lot of roller coasters. So naturally, people (myself included) tend to ask questions like “which park has Europe’s best roller coaster lineup?” or “which parks are quality-over-quantity and which parks are quantity-over-quality?”, amongst others. As such, while it’s not really a discussion thread as such, I thought it might be fun to try to take a quantitative look into some of these questions and try to answer them using some data science techniques. So join me as I attempt to perform a quantitative, multi-part analysis of Europe’s major coaster selections! I'll split my investigations into a couple of posts, one for each question, to make it a little more digestible.
Before we start, let me set out a few prerequisites and explain some of the facts regarding the investigation…
Prerequisites of the Investigation
The Dataset
When applying my criteria and thinking of parks in Europe that might qualify for this, as well as searching through RCDB just to check that I hadn't missed any obvious ones (as it turned out, I had missed a few on the first check...), I came out with approximately 36 theme parks to analyse in total, with 253 scoreable roller coasters between them. The theme parks being studied are as follows, with the number of scoreable roller coasters each park has being listed in brackets:
Let's move on to some fun stuff now... I'll start analysing some different common questions and see what answers I come out with. I'll use this first post to do...
Which European theme park has the strongest coaster lineup?
Let's start with the big one; which European theme park has the strongest coaster lineup?
There are many different ways you could measure this, but I'll start with the simplest one; the mean coaster rating of each park...
Mean Coaster Ranking of each Park
If I look at the Explore function of this spreadsheet, the top 10 highest mean ratings come out as follows:
Those certainly aren't the answers I'd have expected, I'll admit, but that's what the data says for that particular method. However, it should be said that the mean is far more easily swayed by outliers in any particular direction than some other methods (for instance, it's very easily swayed by one coaster rating much more highly or lowly than the others on average).
Let's explore a different method...
Median Coaster Rating of each Park
Instead of using the calculated average (mean), I'm going to be using the median, the middle-ranking value for each park, this time.
Using Google Sheets to explore the median values instead of the mean, the top 10 median values are as follows:
Interesting to see that we have quite a few differing results when we change to the median; in spite of the top 3 staying consistent, 4-10 have actually changed a fair amount! I guess the median is possibly a better gauge of a consistently well-rated coaster selection than mean, because it isn't as easily swayed by one particularly highly rated or lowly rated attraction. But at the same time, it also doesn't really take into account those more highly rated or lowly rated coasters either; if a park's highest rated coaster is rated more highly than a median of 7/10, for instance, it makes no difference whether it's an 8/10 or 10/10.
With that in mind, I have concocted my own formula (of sorts) that I think offers the best of both worlds...
My formula for coaster selection quality
The formula that I propose seems to me like a good way to take into account both a park's highly rated coasters and the consistent quality of their selection. It is as follows:
Matt N's Formula for Coaster Selection Quality: Score = (Highest rating + upper quartile)*(Lowest rating + lower quartile)
Now I don't know if I've got my assumptions 100% correct here, but my assumption was that the use of the highest rating and lowest rating would ensure that any standouts at either end are adequately accounted for, but the use of the quartiles would ensure that the consistency of a park's coaster selection is also accounted for, and that the two metrics cancel each other out and make the playing field level. The higher the score, the higher the rank.
Using the Matt N Formula, the top 10 was as follows:
I'll admit those aren't the results I expected, and I know they probably look a bit weird to some of you, but that is what the data came out with.
So, in conclusion...
Well, that produced some interesting data! I'll admit that the results weren't quite what I was expecting, but I do think they make sense when you look at the data.
In terms of the answer to the initial question of "what is Europe's highest rated coaster selection?"; even though the parks in the top 10 for each method varied, the top 3 stayed consistent every time, and that top 3 was Liseberg, Phantasialand and Alton Towers. In terms of an order for those top 3; I'd probably go with something like this based on the data:
Before we end, here's the Google Sheet with my calculations, for your viewing pleasure:
And here is the dataset shown in visual form using a boxplot, coded in Python using MatPlotLib, Seaborn and Pandas (Python libraries). This shows the median, upper quartile, lower quartile, highest value, lowest value and any outliers (values more than 1.5 times the interquartile range from the upper or lower quartile) for each park:
I know that the x-axis is a bit of a jumbled mess, so let me clear up the order in which the parks appear so that you can more clearly see which park's boxplot is which.
The boxplots appear in the following order, from left to right:
I apologise for the ridiculously long post, I hope you find this interesting, and if you have any questions or feedback, or if anything isn't clear, then don't be afraid to ask me!
Hi guys. There are a lot of theme parks in Europe, as well as a lot of roller coasters. So naturally, people (myself included) tend to ask questions like “which park has Europe’s best roller coaster lineup?” or “which parks are quality-over-quantity and which parks are quantity-over-quality?”, amongst others. As such, while it’s not really a discussion thread as such, I thought it might be fun to try to take a quantitative look into some of these questions and try to answer them using some data science techniques. So join me as I attempt to perform a quantitative, multi-part analysis of Europe’s major coaster selections! I'll split my investigations into a couple of posts, one for each question, to make it a little more digestible.
Before we start, let me set out a few prerequisites and explain some of the facts regarding the investigation…
Prerequisites of the Investigation
- I am using the coaster ratings on Captain Coaster (https://captaincoaster.com/en/) as of March 2022 to perform this investigation. If you look at each ride’s page on CC, it has a % score out of 100; this is what I have used and converted into ratings out of 10. For the rating out of 10 of a ride, I converted the percentage into a rating by dividing by 10 (so for instance, a ride rated 87% would have an average rating of 8.7/10).
- Building upon the ratings stuff; all ratings are rounded to the nearest 0.1 (so to 1dp).
- As a rule of thumb of what’s considered major, I went with; to be considered, a park must have 5 scoreable roller coasters. If you’re wondering why I get so specific in saying “scoreable roller coasters”, it’s because Captain Coaster does not score what it considers to be “kiddie coasters”, so not every ride in a park's lineup is scored. As such, this means that parks with 5 kiddie coasters wouldn't be eligible for this investigation; my rule ensures that a park in the study has 5 family/family thrill coasters, at the very least. It also doesn't score rides where the ridership is too low, but that doesn't really affect this investigation; even the newest major coasters in Europe like Ride to Happiness and Kondaa were ridden enough to be scoreable.
- However, one inconsistency is that Captain Coaster has a somewhat inconsistent definition of what it considers a kiddie coaster. For instance, things like the Steeplechases at Blackpool are considered kiddie coasters, but Blue Flyer in the same park, which I personally would consider a kiddie coaster, isn't. The site also has rides listed on it that some probably wouldn't count as roller coasters, but some do, such as SuperSplash at Plopsaland and Fuga de Atlantide at Gardaland. I just decided to go with the site's scores and the rides that the site scored, as even though I could calculate the mean rating of some unscored rides, I don't think CC's scoring system only uses mean rating, as I seem to remember it being mentioned that members' rankings are also factored in, so me attempting to meddle with CC's system risks introducing bias and skewing the data the wrong way, which you definitely don't want in a data investigation. However, I did think this was something I should raise before we begin.
- The most important prerequisite of all is that the results of this investigation are not necessarily the final answers to the questions I raised in my introductory paragraph by any stretch. All of this still comes entirely down to personal opinion, of course.
The Dataset
When applying my criteria and thinking of parks in Europe that might qualify for this, as well as searching through RCDB just to check that I hadn't missed any obvious ones (as it turned out, I had missed a few on the first check...), I came out with approximately 36 theme parks to analyse in total, with 253 scoreable roller coasters between them. The theme parks being studied are as follows, with the number of scoreable roller coasters each park has being listed in brackets:
- Alton Towers, UK (9)
- Bellewaerde, Belgium (6)
- Blackpool Pleasure Beach, UK (10)
- Bobbejaanland, Belgium (8)
- Djurs Sommerland, Denmark (6)
- Efteling, Netherlands (8)
- Energylandia, Poland (11)
- Europa Park, Germany (12)
- Farup Sommerland, Denmark (6)
- Flamingo Land, UK (5)
- Freizeit-Land Geiselwind, Germany (5)
- Gardaland, Italy (8)
- Grona Lund, Sweden (6)
- Hansa Park, Germany (6)
- Heide Park, Germany (8)
- Linnanmaki, Finland (8)
- Liseberg, Sweden (5)
- Mirabilandia, Italy (8)
- Movie Park Germany, Germany (8)
- Nigloland, France (6)
- Parc Asterix, France (5)
- Parque de Atracciones de Madrid, Spain (5)
- Parque Warner Madrid, Spain (6)
- Phantasialand, Germany (8)
- Plopsaland de Panne, Belgium (7)
- PortAventura Park, Spain (8)
- PowerPark, Finland (6)
- Skyline Park, Germany (5)
- Thorpe Park, UK (7)
- Toverland, Netherlands (6)
- Tripsdrill, Germany (6)
- TusenFryd, Norway (5)
- Walibi Belgium, Belgium (9)
- Walibi Holland, Netherlands (6)
- Walibi Rhone-Alpes, France (5)
- Wiener Prater, Austria (10)
Let's move on to some fun stuff now... I'll start analysing some different common questions and see what answers I come out with. I'll use this first post to do...
Which European theme park has the strongest coaster lineup?
Let's start with the big one; which European theme park has the strongest coaster lineup?
There are many different ways you could measure this, but I'll start with the simplest one; the mean coaster rating of each park...
Mean Coaster Ranking of each Park
If I look at the Explore function of this spreadsheet, the top 10 highest mean ratings come out as follows:
Ranking | Park | Mean Rating out of 10 (to 1dp) | Number of Scoreable Coasters |
1 | Liseberg | 7.6 | 5 |
2 | Phantasialand | 7.5 | 8 |
3 | Alton Towers | 7.3 | 9 |
4 | Grona Lund | 6.9 | 6 |
5 | Efteling | 6.5 | 8 |
6 | Toverland | 6.3 | 6 |
7 | Walibi Holland | 6.2 | 6 |
8 | Tripsdrill | 6.1 | 6 |
9 | Europa Park | 6.1 | 12 |
10 | Djurs Sommerland | 6.1 | 6 |
Those certainly aren't the answers I'd have expected, I'll admit, but that's what the data says for that particular method. However, it should be said that the mean is far more easily swayed by outliers in any particular direction than some other methods (for instance, it's very easily swayed by one coaster rating much more highly or lowly than the others on average).
Let's explore a different method...
Median Coaster Rating of each Park
Instead of using the calculated average (mean), I'm going to be using the median, the middle-ranking value for each park, this time.
Using Google Sheets to explore the median values instead of the mean, the top 10 median values are as follows:
Ranking | Park | Median Rating out of 10 | Amount of Scoreable Coasters |
1 | Liseberg | 8.9 | 5 |
2 | Alton Towers | 7.7 | 9 |
3 | Phantasialand | 7.7 | 8 |
4 | Walibi Holland | 7.2 | 6 |
5 | Thorpe Park | 6.9 | 7 |
6 | Grona Lund | 6.9 | 6 |
7 | Parque Warner Madrid | 6.3 | 6 |
8 | Heide Park | 6.3 | 8 |
9 | Tripsdrill | 6.3 | 6 |
10 | Toverland | 6.2 | 6 |
With that in mind, I have concocted my own formula (of sorts) that I think offers the best of both worlds...
My formula for coaster selection quality
The formula that I propose seems to me like a good way to take into account both a park's highly rated coasters and the consistent quality of their selection. It is as follows:
Matt N's Formula for Coaster Selection Quality: Score = (Highest rating + upper quartile)*(Lowest rating + lower quartile)
Now I don't know if I've got my assumptions 100% correct here, but my assumption was that the use of the highest rating and lowest rating would ensure that any standouts at either end are adequately accounted for, but the use of the quartiles would ensure that the consistency of a park's coaster selection is also accounted for, and that the two metrics cancel each other out and make the playing field level. The higher the score, the higher the rank.
Using the Matt N Formula, the top 10 was as follows:
Ranking | Park | Matt N Formula Score | Upper quartile | Lower quartile | Highest rating | Lowest rating | Amount of Scoreable Coasters |
1 | Alton Towers | 196.9 | 8.4 | 7.1 | 9.5 | 3.9 | 9 |
2 | Phantasialand | 191.7 | 9.2 | 7.1 | 9.8 | 3 | 8 |
3 | Liseberg | 188.2 | 9.4 | 8 | 9.8 | 1.8 | 5 |
4 | Grona Lund | 183.2 | 7.7 | 6 | 9 | 5 | 6 |
5 | Efteling | 164.5 | 8.2 | 5.3 | 8.5 | 4.6 | 8 |
6 | Europa Park | 141.4 | 7.3 | 4.9 | 9 | 3.8 | 12 |
7 | Toverland | 128.8 | 8.5 | 4.9 | 9.2 | 2.4 | 6 |
8 | Tripsdrill | 120.2 | 8.1 | 5.3 | 8.8 | 1.8 | 6 |
9 | Djurs Sommerland | 112.8 | 7.9 | 5.3 | 9.3 | 1.3 | 6 |
10 | Parque de Atracciones de Madrid | 110.3 | 7.1 | 4.9 | 7.6 | 2.6 | 5 |
So, in conclusion...
Well, that produced some interesting data! I'll admit that the results weren't quite what I was expecting, but I do think they make sense when you look at the data.
In terms of the answer to the initial question of "what is Europe's highest rated coaster selection?"; even though the parks in the top 10 for each method varied, the top 3 stayed consistent every time, and that top 3 was Liseberg, Phantasialand and Alton Towers. In terms of an order for those top 3; I'd probably go with something like this based on the data:
- Liseberg (won 2/3)
- Alton Towers (beat Phantasialand in 2/3, while Phantasialand only beat Towers in 1/3)
- Phantasialand
Before we end, here's the Google Sheet with my calculations, for your viewing pleasure:
A Quantitative Analysis of Europe's Major Coaster Selections
docs.google.com
And here is the dataset shown in visual form using a boxplot, coded in Python using MatPlotLib, Seaborn and Pandas (Python libraries). This shows the median, upper quartile, lower quartile, highest value, lowest value and any outliers (values more than 1.5 times the interquartile range from the upper or lower quartile) for each park:
I know that the x-axis is a bit of a jumbled mess, so let me clear up the order in which the parks appear so that you can more clearly see which park's boxplot is which.
The boxplots appear in the following order, from left to right:
- Alton Towers
- Thorpe Park
- Blackpool Pleasure Beach
- Phantasialand
- Liseberg
- Walibi Holland
- Energylandia
- Plopsaland de Panne
- Walibi Belgium
- Europa Park
- PortAventura
- Parque Warner Madrid
- Parque de Atracciones de Madrid
- Efteling
- Bobbejaanland
- Toverland
- Movie Park Germany
- Heide Park
- Hansa Park
- Flamingo Land
- Tripsdrill
- Parc Asterix
- Gardaland
- Mirabilandia
- Djurs Sommerland
- Farup Sommerland
- TusenFryd
- Linnanmaki
- Bellewaerde
- Nigloland
- Skyline Park
- PowerPark
- Grona Lund
- Wiener Prater
- Walibi Rhone-Alpes
- Freizeit-Land Geiselwind
I apologise for the ridiculously long post, I hope you find this interesting, and if you have any questions or feedback, or if anything isn't clear, then don't be afraid to ask me!