What's new

An analysis of Europe’s coaster selection (Part 3: "QuantOverQual" and "QualOverQuant")

Matt N

CF Legend
Disclaimer: This post is extremely long, and if you don't like data analysis and geeky maths talk, I'd suggest you run for the hills and don't look back, because this post has quite a bit of it!
Hi guys. There are a lot of theme parks in Europe, as well as a lot of roller coasters. So naturally, people (myself included) tend to ask questions like “which park has Europe’s best roller coaster lineup?” or “which parks are quality-over-quantity and which parks are quantity-over-quality?”, amongst others. As such, while it’s not really a discussion thread as such, I thought it might be fun to try to take a quantitative look into some of these questions and try to answer them using some data science techniques. So join me as I attempt to perform a quantitative, multi-part analysis of Europe’s major coaster selections! I'll split my investigations into a couple of posts, one for each question, to make it a little more digestible.

Before we start, let me set out a few prerequisites and explain some of the facts regarding the investigation…
Prerequisites of the Investigation
  • I am using the coaster ratings on Captain Coaster (https://captaincoaster.com/en/) as of March 2022 to perform this investigation. If you look at each ride’s page on CC, it has a % score out of 100; this is what I have used and converted into ratings out of 10. For the rating out of 10 of a ride, I converted the percentage into a rating by dividing by 10 (so for instance, a ride rated 87% would have an average rating of 8.7/10).
  • Building upon the ratings stuff; all ratings are rounded to the nearest 0.1 (so to 1dp).
  • As a rule of thumb of what’s considered major, I went with; to be considered, a park must have 5 scoreable roller coasters. If you’re wondering why I get so specific in saying “scoreable roller coasters”, it’s because Captain Coaster does not score what it considers to be “kiddie coasters”, so not every ride in a park's lineup is scored. As such, this means that parks with 5 kiddie coasters wouldn't be eligible for this investigation; my rule ensures that a park in the study has 5 family/family thrill coasters, at the very least. It also doesn't score rides where the ridership is too low, but that doesn't really affect this investigation; even the newest major coasters in Europe like Ride to Happiness and Kondaa were ridden enough to be scoreable.
  • However, one inconsistency is that Captain Coaster has a somewhat inconsistent definition of what it considers a kiddie coaster. For instance, things like the Steeplechases at Blackpool are considered kiddie coasters, but Blue Flyer in the same park, which I personally would consider a kiddie coaster, isn't. The site also has rides listed on it that some probably wouldn't count as roller coasters, but some do, such as SuperSplash at Plopsaland and Fuga de Atlantide at Gardaland. I just decided to go with the site's scores and the rides that the site scored, as even though I could calculate the mean rating of some unscored rides, I don't think CC's scoring system only uses mean rating, as I seem to remember it being mentioned that members' rankings are also factored in, so me attempting to meddle with CC's system risks introducing bias and skewing the data the wrong way, which you definitely don't want in a data investigation. However, I did think this was something I should raise before we begin.
  • The most important prerequisite of all is that the results of this investigation are not necessarily the final answers to the questions I raised in my introductory paragraph by any stretch. All of this still comes entirely down to personal opinion, of course.
Right then; I think that's everything, so let's dive into the dataset...
The Dataset
When applying my criteria and thinking of parks in Europe that might qualify for this, as well as searching through RCDB just to check that I hadn't missed any obvious ones (as it turned out, I had missed a few on the first check...), I came out with approximately 36 theme parks to analyse in total, with 253 scoreable roller coasters between them. The theme parks being studied are as follows, with the number of scoreable roller coasters each park has being listed in brackets:
  1. Alton Towers, UK (9)
  2. Bellewaerde, Belgium (6)
  3. Blackpool Pleasure Beach, UK (10)
  4. Bobbejaanland, Belgium (8)
  5. Djurs Sommerland, Denmark (6)
  6. Efteling, Netherlands (8)
  7. Energylandia, Poland (11)
  8. Europa Park, Germany (12)
  9. Farup Sommerland, Denmark (6)
  10. Flamingo Land, UK (5)
  11. Freizeit-Land Geiselwind, Germany (5)
  12. Gardaland, Italy (8)
  13. Grona Lund, Sweden (6)
  14. Hansa Park, Germany (6)
  15. Heide Park, Germany (8)
  16. Linnanmaki, Finland (8)
  17. Liseberg, Sweden (5)
  18. Mirabilandia, Italy (8)
  19. Movie Park Germany, Germany (8)
  20. Nigloland, France (6)
  21. Parc Asterix, France (5)
  22. Parque de Atracciones de Madrid, Spain (5)
  23. Parque Warner Madrid, Spain (6)
  24. Phantasialand, Germany (8)
  25. Plopsaland de Panne, Belgium (7)
  26. PortAventura Park, Spain (8)
  27. PowerPark, Finland (6)
  28. Skyline Park, Germany (5)
  29. Thorpe Park, UK (7)
  30. Toverland, Netherlands (6)
  31. Tripsdrill, Germany (6)
  32. TusenFryd, Norway (5)
  33. Walibi Belgium, Belgium (9)
  34. Walibi Holland, Netherlands (6)
  35. Walibi Rhone-Alpes, France (5)
  36. Wiener Prater, Austria (10)
I think that just about covers everything, but if you feel I’ve missed an obvious one, then don’t be afraid to tell me.

Let's move on to some fun stuff now... I'll start analysing some different common questions and see what answers I come out with. I'll use this first post to do...
Which European theme park has the strongest coaster lineup?
Let's start with the big one; which European theme park has the strongest coaster lineup?

There are many different ways you could measure this, but I'll start with the simplest one; the mean coaster rating of each park...
Mean Coaster Ranking of each Park

If I look at the Explore function of this spreadsheet, the top 10 highest mean ratings come out as follows:

RankingParkMean Rating out of 10 (to 1dp)Number of Scoreable Coasters
1Liseberg7.65
2Phantasialand7.58
3Alton Towers7.39
4Grona Lund6.96
5Efteling6.58
6Toverland6.36
7Walibi Holland6.26
8Tripsdrill6.16
9Europa Park6.112
10Djurs Sommerland6.16

Those certainly aren't the answers I'd have expected, I'll admit, but that's what the data says for that particular method. However, it should be said that the mean is far more easily swayed by outliers in any particular direction than some other methods (for instance, it's very easily swayed by one coaster rating much more highly or lowly than the others on average).

Let's explore a different method...
Median Coaster Rating of each Park
Instead of using the calculated average (mean), I'm going to be using the median, the middle-ranking value for each park, this time.

Using Google Sheets to explore the median values instead of the mean, the top 10 median values are as follows:
RankingParkMedian Rating out of 10Amount of Scoreable Coasters
1Liseberg8.95
2Alton Towers7.79
3Phantasialand7.78
4Walibi Holland7.26
5Thorpe Park6.97
6Grona Lund6.96
7Parque Warner Madrid6.36
8Heide Park6.38
9Tripsdrill6.36
10Toverland6.26
Interesting to see that we have quite a few differing results when we change to the median; in spite of the top 3 staying consistent, 4-10 have actually changed a fair amount! I guess the median is possibly a better gauge of a consistently well-rated coaster selection than mean, because it isn't as easily swayed by one particularly highly rated or lowly rated attraction. But at the same time, it also doesn't really take into account those more highly rated or lowly rated coasters either; if a park's highest rated coaster is rated more highly than a median of 7/10, for instance, it makes no difference whether it's an 8/10 or 10/10.

With that in mind, I have concocted my own formula (of sorts) that I think offers the best of both worlds...
My formula for coaster selection quality
The formula that I propose seems to me like a good way to take into account both a park's highly rated coasters and the consistent quality of their selection. It is as follows:
Matt N's Formula for Coaster Selection Quality: Score = (Highest rating + upper quartile)*(Lowest rating + lower quartile)
Now I don't know if I've got my assumptions 100% correct here, but my assumption was that the use of the highest rating and lowest rating would ensure that any standouts at either end are adequately accounted for, but the use of the quartiles would ensure that the consistency of a park's coaster selection is also accounted for, and that the two metrics cancel each other out and make the playing field level. The higher the score, the higher the rank.

Using the Matt N Formula, the top 10 was as follows:
RankingParkMatt N Formula ScoreUpper quartileLower quartileHighest ratingLowest ratingAmount of Scoreable Coasters
1Alton Towers196.98.47.19.53.99
2Phantasialand191.79.27.19.838
3Liseberg188.29.489.81.85
4Grona Lund183.27.76956
5Efteling164.58.25.38.54.68
6Europa Park141.47.34.993.812
7Toverland128.88.54.99.22.46
8Tripsdrill120.28.15.38.81.86
9Djurs Sommerland112.87.95.39.31.36
10Parque de Atracciones de Madrid110.37.14.97.62.65
I'll admit those aren't the results I expected, and I know they probably look a bit weird to some of you, but that is what the data came out with.
So, in conclusion...
Well, that produced some interesting data! I'll admit that the results weren't quite what I was expecting, but I do think they make sense when you look at the data.

In terms of the answer to the initial question of "what is Europe's highest rated coaster selection?"; even though the parks in the top 10 for each method varied, the top 3 stayed consistent every time, and that top 3 was Liseberg, Phantasialand and Alton Towers. In terms of an order for those top 3; I'd probably go with something like this based on the data:
  1. Liseberg (won 2/3)
  2. Alton Towers (beat Phantasialand in 2/3, while Phantasialand only beat Towers in 1/3)
  3. Phantasialand
However, I should stress that just because my data analysis put these parks on top, that is not "the correct answer" to the question by any stretch. As with most things, it all boils down to your own personal opinion and personal preference. You might think these results are hogwash, and that's fine; your personal answer to this question is entirely down to your opinion.

Before we end, here's the Google Sheet with my calculations, for your viewing pleasure:

And here is the dataset shown in visual form using a boxplot, coded in Python using MatPlotLib, Seaborn and Pandas (Python libraries). This shows the median, upper quartile, lower quartile, highest value, lowest value and any outliers (values more than 1.5 times the interquartile range from the upper or lower quartile) for each park:
Coaster-Lineups-Boxplot.png

I know that the x-axis is a bit of a jumbled mess, so let me clear up the order in which the parks appear so that you can more clearly see which park's boxplot is which.

The boxplots appear in the following order, from left to right:
  1. Alton Towers
  2. Thorpe Park
  3. Blackpool Pleasure Beach
  4. Phantasialand
  5. Liseberg
  6. Walibi Holland
  7. Energylandia
  8. Plopsaland de Panne
  9. Walibi Belgium
  10. Europa Park
  11. PortAventura
  12. Parque Warner Madrid
  13. Parque de Atracciones de Madrid
  14. Efteling
  15. Bobbejaanland
  16. Toverland
  17. Movie Park Germany
  18. Heide Park
  19. Hansa Park
  20. Flamingo Land
  21. Tripsdrill
  22. Parc Asterix
  23. Gardaland
  24. Mirabilandia
  25. Djurs Sommerland
  26. Farup Sommerland
  27. TusenFryd
  28. Linnanmaki
  29. Bellewaerde
  30. Nigloland
  31. Skyline Park
  32. PowerPark
  33. Grona Lund
  34. Wiener Prater
  35. Walibi Rhone-Alpes
  36. Freizeit-Land Geiselwind
So, I hope you found my first dive into European coaster selection data interesting! I'll certainly be answering more questions about this dataset at some point in the near future; I've got some ideas of my own, but I'm also happy to accept suggestions from any of you of questions you'd like answering.

I apologise for the ridiculously long post, I hope you find this interesting, and if you have any questions or feedback, or if anything isn't clear, then don't be afraid to ask me!
 

Christian

Hyper Poster
I gotta say the dedication here is pretty impressive. I don't personally agree with the final ranking. I am not questioning your sources or method and I quite admire your formula.

But for me I would never see Parque de Atracciones, Efteling, Djurs Sommerland on a list of coaster quality while parks like Energylandia and Portaventura are not there, even when looking at averages.
 

Matt N

CF Legend
I gotta say the dedication here is pretty impressive. I don't personally agree with the final ranking. I am not questioning your sources or method and I quite admire your formula.

But for me I would never see Parque de Atracciones, Efteling, Djurs Sommerland on a list of coaster quality while parks like Energylandia and Portaventura are not there, even when looking at averages.
Fair enough! I’ll admit that some of those surprised me too, and I guess one could argue that the results show up a weakness of using the method I used. That weakness is that I think it punishes parks for being even slightly inconsistent (one really lowly rated ride can often make quite a difference to the score) and punishes them for installing filler, while more consistent parks without any particularly terribly rated rides tend to thrive under this scoring system, even if they didn’t hit the highs of some others lower down (which is why parks like Efteling make it in, as well as the likes of Parque de Atracciones and Djurs… these parks don’t have any especially terribly rated rides, and they have a top rated coaster that’s rated at least quite highly, even if not super highly like some of the bigger heavyweights. Although with that being said, Piraten’s was 9.3, which is certainly quite a substantial average rating; right up there with some of the more popular heavyweights, for sure).

I also noticed that more top-heavy parks tended to benefit under this system; all of the top 3 are what you might consider quite top-heavy in coaster lineup, whereas parks with a greater amount of filler coasters like Europa Park, Energylandia et al get punished by this system, even if the park has some really highly rated rides.

However, I’m not really sure there was a better way I could have done it. Every other possible method I thought of induced its own weird bias into the system.
 

Matt N

CF Legend
Sorry to double post, but I had a thought while in the shower this morning as to why the results might have been so weird when I applied my own formula.

As much as I tried to make high ratings and low ratings carry equal weight in terms of how a coaster selection is rated, I failed to take into account some real-life bias that exists when evaluating coaster selections by doing that. That real-life bias is that enthusiasts naturally gravitate more towards highly rated rides when evaluating a park’s coaster selection, whereas my formula assumed that highly rated and lowly rated would be equally weighted in the minds of enthusiasts, which isn’t really how it works. For instance, this formula assumes that removing Viking Roller Coaster from Energylandia and removing Zadra from Energylandia would have exactly the same level of impact on the rating of its coaster selection. However, I’d wager that most enthusiasts would see Energylandia’s coaster selection quality as being far more impacted by the removal of Zadra than by the removal of Viking Roller Coaster.

As such, I’ll play around with an altered version of the Matt N Formula when I get some time later today, one that weights the score more towards the higher rated rides, and see what I come out with.
 

Hixee

Flojector
Staff member
Administrator
Moderator
Social Media Team
I knew this thread was going to be a doozy when I saw the topic title. I've edited it to make it a bit less... well... less.

I'll admit those aren't the results I expected, and I know they probably look a bit weird to some of you, but that is what the data came out with.
So, in conclusion...
Well, that produced some interesting data! I'll admit that the results weren't quite what I was expecting, but I do think they make sense when you look at the data.
You've said this several time, and I can't help but think... "really?". I mean, you've basically just pulled out a list of the best parks in Europe. What exactly were you thinking would be different? Sure, I might personally not rank them in exactly that order, and I might have swapped out the odd one or two, but I'm hardly surprised by that result.

As such, I’ll play around with an altered version of the Matt N Formula when I get some time later today, one that weights the score more towards the higher rated rides, and see what I come out with.
I completely agree with this - I think you need to somehow adjust for this effect.

It's a nice analysis, and I'm pleased that you're started to dive into Python. :D
 

Nitefly

Hyper Poster
Thanks for this @Matt N - putting your studies to use I presume 💪

I don’t mind, to a degree, if a park has a number of throwaway bad rides. Energylandia is probably a good example. You go to that park because of its best rides, which you can enjoy without really having any care for the worst. When a ride is bad, I tend to think “oh well” and focus on the best. I don’t allow my impression of Energylandia to be blighted by all of its terrible rides, because I’m principally keen to ride its good rides.

With that in mind, if I were to run a similar analysis, I think I’d be tempted to deliberately place less emphasis on the lower ranking rides. For example, you could limit your analysis the the top, say, 6-7 highest ranking coasters in each park.

You really don’t need to do that by the way, your approach should depend on your own personal assessment of how ‘the best coaster line-up’ should be measured.
 

JoshC.

Strata Poster
Cracking analysis Matt; enjoyed reading.

I think I have one key point though. You say you are trying to answer the question: Which European theme park has the strongest coaster lineup?
I don't think that's quite what your data is showing.

In short, you're taking all of a park's (non-kiddie) coasters, and in some way, "averaging" their score. We'll use the term averaging loosely here, since you're trying different methods, etc.
But what that means is you're looking at how consistently strong the line up is within a park. I think that's a slightly different thing.

I think Efteling is a great example to highlight this. Their coasters are pretty solid, but none of them are generally regarded as remarkable/amazing things. Not one of their coasters is generally regarded in the upper echelons of quality in Europe. So it feels somewhat wrong almost to suggest they're a European park with one of the strongest coaster line ups, as some of the data suggests. But do they have a consistently strong, good quality line up? Definitely.

I'd then argue that Walibi Holland has a strong coaster line up, for example. I'm biased here, but Untamed-Lost Gravity-Goliath is an excellent trio, which are pretty consistently highly rated. And importantly, their strengths aren't taken away by the fact that weak coasters exist at the park.

So, in my opinion, what your data is answering is: Which European theme park has the most consistently strongest-rated coaster line up?
Obviously everything is open to interpretation though, and I could be wrong.


As for how you'd answer your original question, in my opinion. Based off my opinion, I think you should take a park's highest rated coasters, and consider them.
You're considering parks which have at least 5 scorable coaster.
So maybe take the 3 highest scoring coasters from each park (so you're taking at most half the park's scorable coasters), and add their scores together. Then rank the parks depending on these sum score.

My reasoning would be:
-You consider a subset of each park's strongest rides
-A park's weaker rides are ignored
-You consider the top rides on their own merit, as well as the sum of how the park goes

There's loads of flaws with this of course, but I personally think that would reflect the "strongest coaster line up" a bit more. When I think about a strong coaster line up, I don't care about the smaller rides, I care only about the park's best line up.



Also, as someone who doesn't use Captain Coaster, I'm curious how the ranking works. Are the percentage scores calculated by some formula which draws from users' rankings? Or do users score rides out of 10 (or 100 or whatever), and these are then averaged/weighted in some way?
 

Matt N

CF Legend
I played around with altering the Matt N formula.

I tried doing three alterations.

Altered Matt N Formula 1
The first altered Matt N formula I tried was as follows:
Altered Matt N Formula 1: Score = (Highest rating + upper quartile)^2 + (Lowest rating + lower quartile)

I squared the bracket containing highest rating + upper quartile in an attempt to give the higher ranked coasters slightly more weight.

And the results were...
RankingParkAltered Matt N Formula ScoreOriginal Matt N Formula ScoreRank with Original FormulaChange
1Phantasialand3646.7191.72+1
2Liseberg3612.7188.23+1
3Alton Towers3524.5196.91-2
4Grona Lund3049.4183.240
5Efteling2747.1164.550
6Europa Park2297.3141.460
7Toverland2279.2128.870
8Tripsdrill2029120.280
9Djurs Sommerland1933.9112.890
10Parque de Atracciones1620.7110.3100
Altered Matt N Formula 2
And the second formula I tried was:
Altered Matt N Formula 2: Score = (Highest rating^2 + upper quartile) + (Lowest rating + lower quartile)

I squared the highest rating to try and make that have more of an impact, and the result was as follows:
RankingParkAltered Matt N Formula ScoreOriginal Matt N Formula ScoreRank with Original FormulaChange
1Alton Towers1085.2196.910
2Phantasialand1060.5191.720
3Liseberg1033.3188.230
4Grona Lund975.2183.240
5Efteling792.4164.550
6Europa Park767.8141.460
7Toverland677.6128.870
8Djurs Sommerland620.3112.89+1
9Tripsdrill609.3120.28-1
10PortAventura577.494.511+1
As you can see, doing those first two formulas changed... very little. I then decided to consult a final alteration...
Altered Matt N Formula 3: Score = (Highest rating + Upper quartile)/2

For the final formula, I eliminated the lower ends of the coaster selection entirely, focusing only on the highest rating and the upper quartile. I calculated the mean of these two values so as to gauge an average quality of a park's "top" coasters. The results were as follows:
RankingParkAltered Matt N Formula ScoreOriginal Matt N Formula ScoreRank with Original FormulaChange
1Liseberg9.6188.23+2
2Walibi Holland9.582.315+13
3Phantasialand9.5191.72-1
4Energylandia9.361.119+15
5Plopsaland de Panne957.320+15
6Alton Towers9196.91-5
7Toverland8.9128.870
8Hansa Park8.987.613+5
9Parque Warner8.849.626+17
10Djurs Sommerland8.6112.89-1
Interesting to see how things change quite a bit when the lower coasters are removed from the equation... Phantasialand and Liseberg remain in the top 3, but for the first time, Alton Towers has been ousted from the top 3, landing at #6 when only their top coasters are concerned.
 
Last edited:

Matt N

CF Legend
Sorry for double posting and for the late replies, but there’s a number of comments in this thread that I realise I never responded to, and I’d like to address individually.
You've said this several time, and I can't help but think... "really?". I mean, you've basically just pulled out a list of the best parks in Europe. What exactly were you thinking would be different? Sure, I might personally not rank them in exactly that order, and I might have swapped out the odd one or two, but I'm hardly surprised by that result.
I think I was surprised by some of the parks consistently ranking towards the top and by some of the other parks getting left lower down; similarly to what @ChristianPalsson initially mentioned, I wouldn’t personally have pegged some parks in the top 10 as “strongly rated coaster parks”, and I felt that there were a few notable European heavyweights omitted from the original top 10. For instance, there were quite a few smaller parks in the top 10 using my initial formula and original methods that don’t commonly seem to be mooted as strong European coaster parks, such as Efteling and Parque de Atracciones amongst others, while some theme parks that do seem to be mooted as strong European coaster parks, such as Energylandia and Walibi Holland amongst others, were left fairly low. This did begin to align more with people’s typical coaster lineup opinions when I removed the lower values from my formula, but even then, there were still a couple that surprised me.
Thanks for this @Matt N - putting your studies to use I presume 💪

I don’t mind, to a degree, if a park has a number of throwaway bad rides. Energylandia is probably a good example. You go to that park because of its best rides, which you can enjoy without really having any care for the worst. When a ride is bad, I tend to think “oh well” and focus on the best. I don’t allow my impression of Energylandia to be blighted by all of its terrible rides, because I’m principally keen to ride its good rides.

With that in mind, if I were to run a similar analysis, I think I’d be tempted to deliberately place less emphasis on the lower ranking rides. For example, you could limit your analysis the the top, say, 6-7 highest ranking coasters in each park.

You really don’t need to do that by the way, your approach should depend on your own personal assessment of how ‘the best coaster line-up’ should be measured.
Indeed I am! To tell you the truth, this little investigation actually popped into my head as an idea when I was doing my Maths for Data Science assignment on Wednesday; I was doing some data visualisation for that, and alarm bells began ringing in my head going “this has potential theme park-y applications!”. I originally intended to use more different types of graphs, such as histograms and scatter graphs, but I tested them out and they didn’t really add anything to the investigation, so I just stuck with the one box plot.

In terms of why I went with the whole score-able coaster selection instead of only a select few; I was hoping to test some other questions on this dataset at some stage that would require a park’s entire score-able coaster selection, and I also think consistency of a coaster selection does play a role in determining the strength of it for some. For instance, I’ve heard a number of reviews of Energylandia that dismiss its coaster selection relative to others on the basis that “there’s only a few decent creds and the bulk of it consists of rubbish +1s”, and I’ve also heard similar criticisms levelled at the selections of places like Europa Park and (outside of Europe) Canada’s Wonderland, amongst others.

I guess I could try just honing in on a park’s 3 top-rated coasters as a future alteration, though, as I guess even only factoring in the upper quartile alongside the highest value might see you go more into the “filler” territory of some lineups. Although it did seem to align more with people’s general opinions (for instance, many of the parks mentioned in similar threads on CF only made the top 10 when this method was used) when I did that, so maybe not.
Cracking analysis Matt; enjoyed reading.

I think I have one key point though. You say you are trying to answer the question: Which European theme park has the strongest coaster lineup?
I don't think that's quite what your data is showing.

In short, you're taking all of a park's (non-kiddie) coasters, and in some way, "averaging" their score. We'll use the term averaging loosely here, since you're trying different methods, etc.
But what that means is you're looking at how consistently strong the line up is within a park. I think that's a slightly different thing.

I think Efteling is a great example to highlight this. Their coasters are pretty solid, but none of them are generally regarded as remarkable/amazing things. Not one of their coasters is generally regarded in the upper echelons of quality in Europe. So it feels somewhat wrong almost to suggest they're a European park with one of the strongest coaster line ups, as some of the data suggests. But do they have a consistently strong, good quality line up? Definitely.

I'd then argue that Walibi Holland has a strong coaster line up, for example. I'm biased here, but Untamed-Lost Gravity-Goliath is an excellent trio, which are pretty consistently highly rated. And importantly, their strengths aren't taken away by the fact that weak coasters exist at the park.

So, in my opinion, what your data is answering is: Which European theme park has the most consistently strongest-rated coaster line up?
Obviously everything is open to interpretation though, and I could be wrong.


As for how you'd answer your original question, in my opinion. Based off my opinion, I think you should take a park's highest rated coasters, and consider them.
You're considering parks which have at least 5 scorable coaster.
So maybe take the 3 highest scoring coasters from each park (so you're taking at most half the park's scorable coasters), and add their scores together. Then rank the parks depending on these sum score.

My reasoning would be:
-You consider a subset of each park's strongest rides
-A park's weaker rides are ignored
-You consider the top rides on their own merit, as well as the sum of how the park goes

There's loads of flaws with this of course, but I personally think that would reflect the "strongest coaster line up" a bit more. When I think about a strong coaster line up, I don't care about the smaller rides, I care only about the park's best line up.



Also, as someone who doesn't use Captain Coaster, I'm curious how the ranking works. Are the percentage scores calculated by some formula which draws from users' rankings? Or do users score rides out of 10 (or 100 or whatever), and these are then averaged/weighted in some way?
I’ll admit I may have phrased the question poorly. And to tell you the truth, I’m not really sure which of those I was initially trying to answer, in hindsight. However, I guess that looking at it, you could split my results into the answers for two separate sub-questions. The initial formula I proposed, that took into account both upper ends and lower ends of coaster selections equally, might work quite well to answer the question of “which European park has the most consistently strong coaster selection?”, whereas to answer “which European park has the strongest upper-end coaster selection?”, the final formula I proposed in my most recent post above might work better, as it hones in exclusively on the higher end of a park’s coaster selection, and only focuses on stuff enthusiasts are interested in, for the most part.

As for Captain Coaster; I’m not entirely sure how its scoring system works, to tell you the truth, but from memory, I seem to remember it being said that it was some combination of average rating and also the rankings a ride gets within lists relative to other rides (kind of similar to Mitch Hawker, where rides have “duels” with each other). Don’t quote me on that, though; I think one of CF’s Captain Coaster mods would be a better person to ask than myself.
 

Matt N

CF Legend
Right; apologies for the triple post, but I decided to have another go at Part 1. But this time, I did what some people suggested and calculated the mean and median using only the park's 3 top-rated coasters. When I did this, the results were as follows (to 1dp):
Mean
ParkMean Rating of Top 3 (1dp)
Energylandia9.6
Phantasialand9.5
Liseberg9.4
Walibi Holland9.3
Alton Towers8.8
Europa Park8.8
Plopsaland8.6
Parque Warner8.6
Toverland8.5
Heide Park8.4
Median
ParkMedian Rating of Top 3 (1dp)
Energylandia9.8
Phantasialand9.6
Mirabilandia9.6
Liseberg9.4
Walibi Holland9.3
Toverland8.9
Hansa Park8.7
Europa Park8.7
Tripsdrill8.6
Parque Warner8.6

I hope you find that interesting! I promise that is the last time I will faff around with part 1... part 2 will be coming soon!

Do you guys have any questions you'd like me to try and answer using this dataset? I've got a couple in mind of my own, but I'm happy to take suggestions!
 

Sandman

Giga Poster
The mean result above looks the best so far to me.

Only difference for me is I'd have Toverland above the likes of Parque Warner and probably even Plopsa (for now).
Also, I do think PA with Shambhala/Khan and Red Force (if we're counting it) or Baco/Stampida beats Warner, Heide , and arguably, Europa's big un's.
 

Matt N

CF Legend
The mean result above looks the best so far to me.

Only difference for me is I'd have Toverland above the likes of Parque Warner and probably even Plopsa (for now).
Also, I do think PA with Shambhala/Khan and Red Force (if we're counting it) or Baco/Stampida beats Warner, Heide , and arguably, Europa's big un's.
PA is the slight elephant in the room, as Red Force isn't actually counted due to it technically being in a separate park. I'd imagine PA might be higher if Red Force was counted, but it's surprisingly low without it.
 

Sandman

Giga Poster
Personal opinion for me I guess - I'd still have PA way above Warner even minus Red Force.
I'd take Shambhala/Khan/Baco over Superman/Stunt Fall/Batman
Stampida over Coaster Express
Tami Tami over Tom and Jerry
And I'll consider Tomahawk as a fun little bonus.

Roadrunner at Warner beats Diablo at PA however.

I know I've not taken the formula into consideration, but that's just my breakdown haha.
 

Matt N

CF Legend
Personal opinion for me I guess - I'd still have PA way above Warner even minus Red Force.
I'd take Shambhala/Khan/Baco over Superman/Stunt Fall/Batman
Stampida over Coaster Express
Tami Tami over Tom and Jerry
And I'll consider Tomahawk as a fun little bonus.

Roadrunner at Warner beats Diablo at PA however.

I know I've not taken the formula into consideration, but that's just my breakdown haha.
Fair enough! I should strongly emphasise that my algorithm is by no means "the correct answer" to the question, but merely the answer derived from data on what a subset of coaster enthusiasts (Captain Coaster users) thinks about these parks on average.
 

Matt N

CF Legend
Right; sorry to double post, but I think it's about time I did Part 2 of this! And for Part 2, I'll be exploring...
What coaster selections in Europe are the most and least consistent?

Now I should clarify that this is not wishing to determine consistent strength, but merely consistency on its own, which can work both ways. So, let's dive straight in!

To work this out, I used two different types of range.

The first measure I used was the range between the highest and lowest ratings, which is a very simple measure where you merely subtract the lowest value from the highest value (Range = Highest Rating - Lowest Rating). The top 5 most and least consistent using that method were as follows:
Top 5 Most Consistent (Using Range)
RankingParkRangeMean Rating (out of 10) (to 1dp)Number of Scoreable Coasters
1Freizeit-Land Geiselwind2.51.45
2Efteling3.96.58
3Grona Lund46.96
4Flamingo Land4.43.15
5Skyline Park4.745
Top 5 Least Consistent (Using Range)
RankingParkRangeMean Rating (out of 10)Number of Scoreable Coasters
1Energylandia105.711
2Walibi Holland9.86.26
3Walibi Belgium9.44.99
4Mirabilandia94.28
5Plopsaland8.957

The other measure I used was the interquartile range between the quartiles (IQR = Upper Quartile - Lower Quartile), which should provide a better gauge of the selection's general consistency and not be too swayed by one particularly highly or lowly rated ride. The top 5 most and least consistent using IQR were as follows:
Top 5 Most Consistent (Using IQR)
RankingParkInterquartile RangeMean Rating (out of 10)Number of Scoreable Coasters
1Blackpool14.810
2Freizeit-Land Geiselwind1.11.45
3Alton Towers1.37.39
4Liseberg1.47.65
5Grona Lund1.76.96
Top 5 Least Consistent (Using IQR)
RankingParkInterquartile RangeMean Rating (out of 10)Number of Scoreable Coasters
1Walibi Rhone-Alpes6.34.55
2Parque Warner5.95.56
3Plopsaland5.857
4Movie Park Germany5.83.88
5Parc Asterix5.755

Finally, let me once again reference the boxplot from Part 1, for a visual aid to show this off:
Coaster-Lineups-Boxplot.png

Let me once again remind you of the order the parks are in, from left to right:
  1. Alton Towers
  2. Thorpe Park
  3. Blackpool Pleasure Beach
  4. Phantasialand
  5. Liseberg
  6. Walibi Holland
  7. Energylandia
  8. Plopsaland de Panne
  9. Walibi Belgium
  10. Europa Park
  11. PortAventura
  12. Parque Warner Madrid
  13. Parque de Atracciones de Madrid
  14. Efteling
  15. Bobbejaanland
  16. Toverland
  17. Movie Park Germany
  18. Heide Park
  19. Hansa Park
  20. Flamingo Land
  21. Tripsdrill
  22. Parc Asterix
  23. Gardaland
  24. Mirabilandia
  25. Djurs Sommerland
  26. Farup Sommerland
  27. TusenFryd
  28. Linnanmaki
  29. Bellewaerde
  30. Nigloland
  31. Skyline Park
  32. PowerPark
  33. Grona Lund
  34. Wiener Prater
  35. Walibi Rhone-Alpes
  36. Freizeit-Land Geiselwind
In terms of how you can visualise the ranges; you can see the range as the difference between the extreme ends of the plot, and the IQR can be visualised as the difference between the ends of the coloured rectangle in the middle.

So, what have we learned from this part of the investigation?

Firstly, I think I can declare Freizeit-Land Geiselwind the winner for consistency in Europe; it scored very highly on consistency using both measures! Even if the selection isn't the most highly rated, it's certainly consistent if nothing else!

Secondly, I found it odd how besides Geiselwind, the results varied drastically dependant on the measure applied. Some parks did appear again besides Geiselwind (for instance, Grona Lund was quite consistently strong by both measures), but many others only appeared in the top 5 for one or the other.

But overall, I think my data has concluded that Freizeit-Land Geiselwind is the winner for most consistent in Europe. And for least consistent, I think I can conclude that Plopsaland de Panne actually wins that one, as it is the only park to appear in the top 5 least consistent for both measures.

I hope you enjoyed discovering which coaster selection is Europe's most consistent (according to the data) in part 2! Part 3 (which I'm thinking may be the final part) will be coming soon...
 

Will

Strata Poster
I don't claim to understand all of that, but that's more to do with my own apathy than the quality of information - this is VERY thorough stuff, makes my spreadsheet look very amateur :)

...the theme park industry are often looking for data analysts, Matt, if that's a path you'd be interested in going down you already look more than qualified to present the results of a satisfaction survey, for instance :D
 

Matt N

CF Legend
...the theme park industry are often looking for data analysts, Matt, if that's a path you'd be interested in going down you already look more than qualified to present the results of a satisfaction survey, for instance :D
You know what, I think that’s something I’d actually really like to do if the opportunity presented itself! And seeing as I’m almost definitely going to be taking a Data Analytics module in Year 2, and there’s a further opportunity to do both Big Data Analytics and an advanced databases module in Year 3, I could be quite well qualified to do such a thing by the end of my degree!
 

Will

Strata Poster
You know what, I think that’s something I’d actually really like to do if the opportunity presented itself! And seeing as I’m almost definitely going to be taking a Data Analytics module in Year 2, and there’s a further opportunity to do both Big Data Analytics and an advanced databases module in Year 3, I could be quite well qualified to do such a thing by the end of my degree!
I noticed the job while I was looking around earlier in the year - didn't quite have the qualifications or I'd have loved to have given it a try myself.

Obviously you could probably earn more in a different industry, but I feel like the chance to work in a field you're totally passionate about would outweigh that (I'm the same!)
Don't be afraid to dream :)
 

Matt N

CF Legend
EDIT: Apologies for the slightly weird title... I couldn't think of a more succinct way to sum up the question!
I think it's time I did the 3rd and final part of this... today, I'll be investigating: Which coaster selections emphasise quantity over quality and which coaster selections emphasise quality over quantity?

Now I'll digress that this one is possibly harder to measure statistically, but it was one I was interested to find out, so I still decided to give it a go!

I used 3 different measures to try and work this out.

The first measure I used was to work out the median:mean ratio, as it always appeared to me as though a higher median denoted a more consistently strong selection (thus more of a quality focus), while a higher mean denoted a less consistently strong selection (thus more of a quantity focus). To work this out, I simply did median/mean, and the results were as follows (to 2sf)...
Top 5 "Quantity over Quality" (Median/Mean)
RankingParkNumber of Scoreable CoastersMedian/Mean (2sf)
1Movie Park Germany80.73
2Freizeit-Land Geiselwind50.74
3Walibi Rhone-Alpes50.74
4Mirabilandia80.76
5Plopsaland70.76
Top 5 "Quality over Quantity" (Median/Mean)
RankingParkNumber of Scoreable CoastersMedian/Mean (2sf)
1Flamingo Land51.4
2Thorpe Park71.2
3Parque Warner61.2
4Liseberg51.2
5Heide Park81.2

The second measure I used was to work out the mean:count ratio, because a park having a high or low mean relative to their coaster count would surely denote whether their coaster selection is quantity over quality or quality over quantity, no? One slight flaw with this method is that any theme park with more than 10 scoreable roller coasters automatically gravitates towards "quantity over quality" by default because you cannot have a mean above 10, although one could argue that having a coaster count of more than 10 makes you quantity-focused to a certain extent anyway...

To work this out, I did mean/count, and the results were as follows (to 2sf)...
Top 5 "Quantity over Quality" (Mean/Count)
RankingParkNumber of Scoreable CoastersMean/Count (2sf)
1Freizeit-Land Geiselwind50.27
2Wiener Prater100.34
3Movie Park Germany80.48
4Blackpool100.48
5Bobbejaanland80.49
Top 5 "Quality over Quantity" (Mean/Count)
RankingParkNumber of Scoreable CoastersMean/Count (2sf)
1Liseberg51.5
2Grona Lund61.2
3Parque de Atracciones51.1
4Toverland61.0
5Walibi Holland61.0

The final measure I used was to repeat the same process as above, but using the median instead of the mean.

To work this out, I did median/count, and the results were as follows (to 2sf)...
Top 5 "Quantity over Quality" (Median/Count)
RankingParkNumber of Scoreable CoastersMedian/Count (2sf)
1Freizeit-Land Geiselwind50.20
2Wiener Prater100.35
3Movie Park Germany80.35
4Mirabilandia80.40
5Gardaland80.43
Top 5 "Quality over Quantity" (Median/Count)
RankingParkNumber of Scoreable CoastersMedian/Count (2sf)
1Liseberg51.8
2Walibi Holland61.2
3Grona Lund61.2
4Parque Warner61.1
5Parque de Atracciones51.1

So, what did we learn from today's analysis?

In terms of which park emphasises quantity over quality most; I think we can conclude that Freizeit-Land Geiselwind is the European winner for this, winning 2 out of 3 measures and coming 2nd in the only one it didn't win. And it won the measures it did win by some distance!

In terms of which park emphasises quality over quantity most; I think we can conclude that Liseberg is the European winner for this, winning 2 out of 3 measures and coming 4th in the only one it didn't win. And as with Geiselwind, it won the measures it did win by some distance!

That brings us to the end of our analysis of European coaster selections. I hope you've enjoyed my little look at the continent's coaster selections using data analysis techniques; I know I've certainly found crunching the numbers interesting! Although if you'd like me to ask any more questions about this dataset, then feel free to give me a suggestion and I'll happily do it for you!

This won't be the last time you see me do one of these, though... I'm hoping to dive into North America's major coaster selections next, so keep your eyes peeled for that at some point in the not-too-distant future!
 
Last edited:

Hixee

Flojector
Staff member
Administrator
Moderator
Social Media Team
I think there's something off in this somewhere if Energylandia doesn't feature in the "quantity over quality" list. Yes, Zadra and Hyperion will top out most the rankings, but there's too much filler crap for them to not register here, surely? As soon as I saw the title, I thought "boom, Energylandia".

The "mean/count" is an interesting one, cos essentially all you're doing there is "sum of rankings/count squared". Sum of count tracks linearly with number of coasters, rather than the squared basis of the denominator. I think this means you probably slightly unfairly weight this towards smaller parks. I think...? Not sure, been a long day for thinking about stats. :p

Standard deviation may also help you hone in on which parks have the most consistent selection. If you wanted to then look at the quality/quantity argument you could do some relationship between StdDev and Mean/Median to track which park has the highest mean, but lowest StdDev - I guess StdDev/Mean where smaller is better.
 
Top