Skip to main content
Search form
  • 00:04

    [DATA VISUALIZATION case study][Data Visualization in Football]

  • 00:14

    GIAN MARCO CAMPAGNOLO: I'm Gian Marco Campagnolo[Gian Marco Campagnolo, Lecturer in Science,Technology and Innovation Studies,University of Edinburgh] and I'm a Lecturer in Scienceand Technology Studies at the University of Edinburghand also Faculty Fellow at the Alan Turing Institute.Today, I will be talking about a data visualization in football.

  • 00:37

    GIAN MARCO CAMPAGNOLO [continued]: The reason for taking football as a field to discuss datavisualization it's because footballmakes my point on data visualization more relatable,because it's a popular field.But I want to make a more general point

  • 00:57

    GIAN MARCO CAMPAGNOLO [continued]: on the non-subject neutral nature of data visualization.So what works for football in termsof visualizing data and representingnumerical pictures, might not work in other fields

  • 01:20

    GIAN MARCO CAMPAGNOLO [continued]: where decisions are taken more slowly, like in policy makingor in data journalism.So visualization do not travel easily from a field to another,even in the domain of high velocity

  • 01:43

    GIAN MARCO CAMPAGNOLO [continued]: environments like football, but also finance.There might be differences in whatis the favorite data visualizationin that particular field.At the same time there is a canon, a favorite data

  • 02:05

    GIAN MARCO CAMPAGNOLO [continued]: visualization whatever the industry you are addressing,there will be a visualization that peopleis more familiar with.So today we are trying to ask the big question, what

  • 02:25

    GIAN MARCO CAMPAGNOLO [continued]: is the favorite, the most popular data visualizationstyle or cannon in football and in sport data visualization.In order to talk about data visualization in football,I want to start from a dashboard that

  • 02:48

    GIAN MARCO CAMPAGNOLO [continued]: represent a system to evaluate a player thatintegrates internal club data, economic data, and performancedata.So in this dashboard, you can see different widgets,four different widgets the one to the top left

  • 03:10

    GIAN MARCO CAMPAGNOLO [continued]: has player demographics information,then there is financial data on the bottom left,how much the player costs to the club,this is an Academy Player.And the most interesting bit is the bar chart to the top right.

  • 03:32

    GIAN MARCO CAMPAGNOLO [continued]: Where you can see the correlation between minutesplayed by the player in the senior teamand the annual evaluation prediction.Finally, there is a traffic light chart

  • 03:53

    GIAN MARCO CAMPAGNOLO [continued]: to the bottom right that representsthe different aspects of player performance from size, pace,to more psychological aspects of the player,including versatility and the social environment,

  • 04:13

    GIAN MARCO CAMPAGNOLO [continued]: including family.And a green, yellow, and red lightstoo represent the amount of confidence in those areas.This visualization is a nice exampleto start with, because it makes the point that in order

  • 04:33

    GIAN MARCO CAMPAGNOLO [continued]: to develop a successful visualization,you have to picture as closely and as exactlyas possible what the audience, the recipientsof these visualization are going to do with it.Understanding data visualization can hardly

  • 04:57

    GIAN MARCO CAMPAGNOLO [continued]: be done in abstract terms.You have to be there on a site where the visualization isused, discussions are made around the numerical picture.Failing to do that which is often the case,

  • 05:20

    GIAN MARCO CAMPAGNOLO [continued]: especially in an environment as secretiveas professional football, where you can hardlyhave access to the locker room or the back roomof the coaching team where decisions are taken on data

  • 05:41

    GIAN MARCO CAMPAGNOLO [continued]: relating to player performance.Failing to be there where visualization is used,you can ask the relevant informant, the coachesand the player what they do with the visualization.This is often what I do in my research

  • 06:03

    GIAN MARCO CAMPAGNOLO [continued]: and I did it in different high velocityenvironment in the business sector previouslyand currently I'm doing it in the football environment.But when you ask, how a visualizationis conducive of a certain decision,

  • 06:25

    GIAN MARCO CAMPAGNOLO [continued]: you better want to have the numerical picturein front of your hand.So I developed a little technique whichI called graph elicitation.This technique derives from another a similar approachdeveloped in the field of sociology

  • 06:48

    GIAN MARCO CAMPAGNOLO [continued]: and anthropology, where in order to elicit memoriesof information, the researcher goes to the fieldby bringing pictures that are considered to beimportant for the informant.So when they ask about family relations,

  • 07:08

    GIAN MARCO CAMPAGNOLO [continued]: they ask the question with a picture of a memberof the family in front of them.So similarly, when I conduct my investigationon how visualization is used, I bring with mean example of visualization and discuss it with the informant.

  • 07:29

    GIAN MARCO CAMPAGNOLO [continued]: So they can point to the part of the graphthat they are more often referring to.And by doing that, you have the most amazing insight on howcomplex and unpredictable is the amount of expertise

  • 07:52

    GIAN MARCO CAMPAGNOLO [continued]: and proficiency the professionalshave to see into the numerical data.The key we said is picturing exactly who the footballoperators are and what decision they have to make,even if you are in the case you have to develop

  • 08:12

    GIAN MARCO CAMPAGNOLO [continued]: a visualization for them.So let's look at the example of the Academy Player.The dashboard representing the Academy Player,the operator in this case is an Academy Officer

  • 08:33

    GIAN MARCO CAMPAGNOLO [continued]: who want to keep track of the workand the investment put into developing a particular player.Decision that these dashboard may supportseems to be referring to the cost effectiveness of retainingthe player.

  • 08:53

    GIAN MARCO CAMPAGNOLO [continued]: As the bar chart widget to the top right of the dashboardshows, the playing time they regularly affect player value.So keeping an Academy Player in the first teamwithout giving him playing time will dramaticallydecrease the value.

  • 09:15

    GIAN MARCO CAMPAGNOLO [continued]: So this dashboard seems to be directed to the operator's needto make a decision about retaining the playeror letting the player go on loan to some other teams.So we are saying football is a high velocity environment,

  • 09:38

    GIAN MARCO CAMPAGNOLO [continued]: but this is not true only for the playerwho obviously on the pitch has to take decisionsin split seconds.This is also true for the coach, the coaching team,they have to prepare at the next game in three days or five

  • 10:01

    GIAN MARCO CAMPAGNOLO [continued]: days.So the role of data visualizationin football and of the football analyst thatconveys the dashboard, the widget, the visualizationto the coaching team is to provide a quick pictureof the vast amount of data related to the team performance

  • 10:30

    GIAN MARCO CAMPAGNOLO [continued]: or the player performance.There are more than 2,000 ball events recorded in every gameand in one season each team might have 50 games,so these data might quickly come out of control

  • 10:56

    GIAN MARCO CAMPAGNOLO [continued]: and the amount is impressive and hardly manageableby the coaching team without a data analystor without someone that works on representing the data usingnumerical pictures.

  • 11:16

    GIAN MARCO CAMPAGNOLO [continued]: So talking about data visualization in footballand elsewhere is always related to the dataand how data is managed in a certain field.So we want to connect with the data provider.Where is the football data coming from?

  • 11:39

    GIAN MARCO CAMPAGNOLO [continued]: So there are two main data provider,the most popular, the most important,the most widespread ones are Opta and Wyscout.The main difference if you ask them,the Opta's value proposition is to beable to provide these data in real time,

  • 12:01

    GIAN MARCO CAMPAGNOLO [continued]: mainly for the media, while Wyscout claimsto have more accuracy in the way data is gatheredand this data is then provided more slowly.So the next day to the club to assess performance, and as we

  • 12:26

    GIAN MARCO CAMPAGNOLO [continued]: will see also to the scout, to the sporting directorto make a decision about the transfer market.I now want to show you one example of how data looklike in the world of football.So I refer to a popular event, so the important header

  • 12:54

    GIAN MARCO CAMPAGNOLO [continued]: with which Robin van Persie scoredthe goal, an important goal against Spainin World Cup 2014.So if you see the picture we have obviouslydata about when the goal has been scored,

  • 13:14

    GIAN MARCO CAMPAGNOLO [continued]: we have data about the player ID which is alwaysthe same in the database, we have the coordinatesof the event, where the event happened in the pitch,we have a very articulated hierarchyof information about the typology of the event.

  • 13:36

    GIAN MARCO CAMPAGNOLO [continued]: This is a header and which happenedin the center of the box during regular play,we have information about what is the previous event thatwas an assist and the related event ID,assist coming from the center of the pitch.

  • 13:57

    GIAN MARCO CAMPAGNOLO [continued]: If you remember, this was the direct attackby left back crossing the ball, hitting the balland directing the ball onto the area from the halfway line.And then we have the destination of the header.

  • 14:17

    GIAN MARCO CAMPAGNOLO [continued]: He went on the goal, he went on these particular coordinatesin this particular position of the ball,and so on and so forth.So you can see from this image, the complexity of the data wehave when it comes to football.

  • 14:41

    GIAN MARCO CAMPAGNOLO [continued]: As a social scientist to make my way into the world of footballdata and understand how visual numbers and numerical picturesare used in the field, I had to acquire coaching badges,

  • 15:04

    GIAN MARCO CAMPAGNOLO [continued]: match analysis qualification.And from attending these aspects of field work,I learned that the coach preferencefor the eye-o-meter of numbers.Which means that the coach rely on their own direct embodied

  • 15:33

    GIAN MARCO CAMPAGNOLO [continued]: experience of what they see in the pitchand the past experience of having seen it so many timesover numbers.When it comes to developing visualizations reportsfor them, it comes and not surprised

  • 15:57

    GIAN MARCO CAMPAGNOLO [continued]: that they have a preference for the one pager, in order--to comply or this approach compliesto their credo about how information should be conveyedin sport, in football, to other members of the coaching

  • 16:21

    GIAN MARCO CAMPAGNOLO [continued]: team and the player.So the four key words are concise, precise, simple,and short.So a couple of examples of the onepager that the football data analyst can develop.There are two main approaches, the first approach

  • 16:45

    GIAN MARCO CAMPAGNOLO [continued]: is to take all the relevant figures or the aspect of playthat are seen traditionally as importantand represent them on the same graph.So we look at one example that refers to player Kevin De

  • 17:12

    GIAN MARCO CAMPAGNOLO [continued]: Bruyne.And this is very much along the lineof the idea of having this cosmic data visualization whereyou put everything you know about the playerand you think this can help somehow taking decisionabout how to play him or her.

  • 17:34

    GIAN MARCO CAMPAGNOLO [continued]: So this graph here is there is a horizontal barchart that wants to represent the overall involvement index.So how much the player, in this caseKevin De Bruyne playing for Belgium national team,how much the player is involved in the game.

  • 17:59

    GIAN MARCO CAMPAGNOLO [continued]: So in the different color codes in the bar chart,you have averages of the main typology of relevant actionsand event that typically characterized the player.So you have the average dribbles, the key passes,

  • 18:20

    GIAN MARCO CAMPAGNOLO [continued]: which are somehow and defined as passesthat cut away one line of play.So passes from defense through to midfieldor passes from midfield through to attack,then you have the assist and the average goals

  • 18:41

    GIAN MARCO CAMPAGNOLO [continued]: scored by the player.So this is one approach, put to everythingyou think may be relevant in the same graphand come up with an overall index.Obviously, there are limitations in this approach, who

  • 19:07

    GIAN MARCO CAMPAGNOLO [continued]: said that these are the relevant variables that determinethe importance of a player.And it's more difficult to compare across playersif you are putting all these variables together.One alternative approach is the one

  • 19:29

    GIAN MARCO CAMPAGNOLO [continued]: I tried together with colleagues at the Alan Turing Institutewhile analyzing data from the World Cup 2014 and World Cup2014 qualifiers.

  • 19:49

    GIAN MARCO CAMPAGNOLO [continued]: So with access to some domain expertise, we conjectured--relevant to understand the particular playingstyle is the network of passes.And there might be and especially around

  • 20:10

    GIAN MARCO CAMPAGNOLO [continued]: the age where Spain was dominatingthe world with a distinctive possession style.And so we came up with two typologiesof passes that can be easily captured

  • 20:33

    GIAN MARCO CAMPAGNOLO [continued]: by network science variables.So we wanted to use the closeness centrality whichis a popular measure in network analysis and network science,to capture the type of A-B passes.

  • 20:53

    GIAN MARCO CAMPAGNOLO [continued]: So the quick, short, and return passes that were typicalof teams like, and adoptively, typicalof teams like Spain and Germany at that time.In machine learning and data science,

  • 21:14

    GIAN MARCO CAMPAGNOLO [continued]: which is also finding its way in the world of sports data,there are techniques that allow to identifythe importance of certain variablesfor performance, for results.

  • 21:36

    GIAN MARCO CAMPAGNOLO [continued]: And this is what we did in our case with closeness centrality.We found a strong correlation between performanceand closeness centrality and we came upwith the idea of visualizing all teams contributing to the World

  • 21:59

    GIAN MARCO CAMPAGNOLO [continued]: Cup qualifiers according to this particular measure.And the visualization, the heat mapI show you now is an example of how these looks like.This is and heat map on the y-axis

  • 22:21

    GIAN MARCO CAMPAGNOLO [continued]: represents all the team that took part to the World Cup 14qualifiers and on the x-axis, represent the 11rolls in a team, and the value representedand color coded in the boxes is the closeness to centrality.

  • 22:43

    GIAN MARCO CAMPAGNOLO [continued]: So you can immediately see that teams like Germanyare halfway into the graph have a very strong red colors,meaning a high centrality for the number 4and the number 8 positions, which are typically

  • 23:04

    GIAN MARCO CAMPAGNOLO [continued]: the positions of the midfield players thatcan play 360 degrees.So usually touch more balls, you see the comparisonruns for 50, 60 national teams.And in one picture you have them all

  • 23:26

    GIAN MARCO CAMPAGNOLO [continued]: compared based on one measure whichwe know from having run a principal componentanalysis, which is one of the available techniques in machinelearning to understand the importanceof a certain variable which we know being crucial to defining

  • 23:48

    GIAN MARCO CAMPAGNOLO [continued]: and distinguishing the playing style of a particular team.In this graph, further down from Germanyyou can also see Spain, E-S-P and the closenessto centrality being distributed across many roles,

  • 24:11

    GIAN MARCO CAMPAGNOLO [continued]: maybe it's not centralized only in the midfield positionbut number 2, number 3, number 4, number 6.A center back that might be maybe better distribution,number 7, number 8, all position centrality.And so the ability to play those short

  • 24:34

    GIAN MARCO CAMPAGNOLO [continued]: and returned passes is immediatelycaptured in this picture as a feature of Spain.And it's immediately related to the performanceof so many other teams.So the second approach here, whichwe took in this particular analysis

  • 24:56

    GIAN MARCO CAMPAGNOLO [continued]: has been that of focusing on the single most important variablewhich gives you the advantage of comparing so many teams.If you go back to the De Bruyne analysis,the involvement index put so many aspects of the player

  • 25:23

    GIAN MARCO CAMPAGNOLO [continued]: performance together, which makesit hard to compare across other players, let alone 80 or moreas we did for the National teams.So these two examples are examples of the one pager

  • 25:46

    GIAN MARCO CAMPAGNOLO [continued]: approach that you want to take when it comes to returning datato the coaching team.In one case, the De Bruyne case youwill be providing an overall measure of the playerinvolvement.In our case with the closeness the centrality of the playing

  • 26:10

    GIAN MARCO CAMPAGNOLO [continued]: styles of so many teams, you may knowwhat are the key roles in one teamand which one should be focused on whenyour game plan is to stop the flow of passesin a certain team.Another point about audiences for data visualization

  • 26:33

    GIAN MARCO CAMPAGNOLO [continued]: in football is about scouting and the transfer market.Surely at the moment, this is the area of footballwhere data science and visualizationshas had more penetration.This is due to the nature of the activity of the sporting

  • 26:58

    GIAN MARCO CAMPAGNOLO [continued]: director, it looks after the transfer market for clubsand the player agent looking at it from the player perspective.They have more time, if you rememberwhat we were talking about earlierabout the high pressure, the high velocity environment.

  • 27:21

    GIAN MARCO CAMPAGNOLO [continued]: They have more time than the player and the coachfor planning ahead and making decisions on the basis of data.And second, the size of the problem, theyhave to deal with.We are talking about tens of thousands of players,

  • 27:44

    GIAN MARCO CAMPAGNOLO [continued]: for thousands of teams, hundreds of leagues, for so many years.So this is why data visualizationappears to be more readily applicable to transfermarket problems.In order to exemplify these, I want to refer to a project,

  • 28:08

    GIAN MARCO CAMPAGNOLO [continued]: I did with my data visualization studentsat the University of Edinburgh.This is a little video of the radiographthat represents a team, Cardiff City in this particular field.And the polar chart, the slices represent the various teams

  • 28:32

    GIAN MARCO CAMPAGNOLO [continued]: that the players at Cardiff City have playedbefore joining Cardiff City.And we were looking in particular to examplesof players having played for the same team beforeand this is how you can see that Bristol, for example,

  • 28:55

    GIAN MARCO CAMPAGNOLO [continued]: has been a team where there is a number of current Cardiff Cityplayers that played for Bristol earlier on.And the idea was to capture the elementsof diversity and cohesion in a particular team.

  • 29:15

    GIAN MARCO CAMPAGNOLO [continued]: So if you look at the radiograph,you look at different slices being one particular playerand his career over 10-years, thereare 16 slices one for each player of Cardiff City.

  • 29:35

    GIAN MARCO CAMPAGNOLO [continued]: We obviously took the 16 player thatplayed more minutes for Cardiff City in the last season.And the color in the slice represent the teamsthat the player played before joining Cardiff City.Cardiff City is in blue, so you see the dominance of blue

  • 29:59

    GIAN MARCO CAMPAGNOLO [continued]: in previous years representing players that haveplayed for Cardiff City before.For example, the light green is representing Bristol City,and you can see that Bristol has been a feeder team for CardiffCity, because some of the players

  • 30:20

    GIAN MARCO CAMPAGNOLO [continued]: played with Cardiff City before.And the idea was to look at the importance of membershipin a particular team.The more player have player for longer, the assumptionbeing that more players have playedfor longer for a particular team there

  • 30:42

    GIAN MARCO CAMPAGNOLO [continued]: is an element of cohesion that mightbe related to performance.At the same time, we were countinglevel of cohesion and diversity at the level of the Nationalteam, the player played for it because,as well as playing for club the teams also

  • 31:03

    GIAN MARCO CAMPAGNOLO [continued]: play for the National team.So you can see the outer ring of the radial diagramrepresents teams that have played for England.At the bottom, you see a number of see seven of these playershave played for England, so there is some cohesion there.

  • 31:25

    GIAN MARCO CAMPAGNOLO [continued]: And there is also diversity relatedto how many other nationalities are represented in the team.The last element is age, so you can see the white lineconverging to the center and the length of those white lines

  • 31:47

    GIAN MARCO CAMPAGNOLO [continued]: is the player age.So the third element of diversity cohesion was age.So looking at age, membership at level of nationalityand membership at level of an existingclub and the past club, we were lookingat the degree of diversity and cohesion

  • 32:10

    GIAN MARCO CAMPAGNOLO [continued]: that may make a team successful.These other graph represents connectionbetween our variable and the team,this sphere at the center of the graphrepresents the degree of diversity and cohesion

  • 32:30

    GIAN MARCO CAMPAGNOLO [continued]: that occur in a particular team level.So once it is found that there is a certain diversity mixor cohesion team that make a team successful, meaningimproving position in the league or doing better than the most

  • 32:54

    GIAN MARCO CAMPAGNOLO [continued]: direct opponent, the sporting director and the player agentmay look for the best fit for the next acquisition.So if results a certain degree of national diversity

  • 33:15

    GIAN MARCO CAMPAGNOLO [continued]: contributes to improve the team performance,then the transfer market agent willlook at buying a player from a different nationality.If it emerged that having played for certain previous team

  • 33:37

    GIAN MARCO CAMPAGNOLO [continued]: together in the past for certain other previous teamsincrease player performance in a particular mix of whichtakes also into account some diversity variable,then the transfer market agents willlook at other players that played with the existing player

  • 34:00

    GIAN MARCO CAMPAGNOLO [continued]: for the same team in the past.So to strengthen the cohesion value of their own team.[Summary]Today, we discussed about data visualization in football

  • 34:24

    GIAN MARCO CAMPAGNOLO [continued]: and the non-subject neutral nature of visualization.So what I said today counts only for a particular high velocityenvironment which is football.

  • 34:46

    GIAN MARCO CAMPAGNOLO [continued]: And I also said that we in order to understand and developsuccessful visualization, the researcherhas to be as close as possible, understand as closely aspossible, how visualization is used in a particular domain,and we discussed about graph elicitation as a techniqueto do that.

  • 35:07

    GIAN MARCO CAMPAGNOLO [continued]: And we also drill down into the detailsof the different audiences in the particular fieldand we talk about coach and playershaving a need of the one pager because of the high pressureunder which they have to perform this season and the transfer

  • 35:30

    GIAN MARCO CAMPAGNOLO [continued]: market agents, the sporting director, the player agent,having a little bit more time and enjoying some richer datavisualization.Given the time that they have and giventhe topology of the problem they have to address.

Video Info

Publisher: SAGE Publications, Ltd.

Publication Year: 2021

Video Type:Case Study

Methods: Data visualization, Observational research

Keywords: communication aids; context effects; data visualisation; football; graphical presentation of data; observation (research); Soccer ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:



Gian Marco Campagnolo, Lecturer in Science, Technology, and Innovation Studies at the University of Edinburgh, discusses the role of and several approaches to data visualization in football.

Looks like you do not have access to this content.

Data Visualization in Football

Gian Marco Campagnolo, Lecturer in Science, Technology, and Innovation Studies at the University of Edinburgh, discusses the role of and several approaches to data visualization in football.

Copy and paste the following HTML into your website