Skip to main content
Search form
  • 00:00

    [MUSIC PLAYING][Studying Online Video Popularity With StochasticComputational Models]

  • 00:09

    MARIAN-ANDREI RIZOIU: My name is Marian-Andrei Rizoiu.[Marian Andrei Rizoiu, Research Fellow, Australian NationalUniversity] I am currently a research fellowat the College of Engineering and ComputerScience at the Australian National University.I work mainly on developing stochastic modelsfor modeling online behavior of users with applicationsfor online popularity.[How did you become interested in online behavior

  • 00:33

    MARIAN-ANDREI RIZOIU [continued]: and computational modeling?]I think my journey is rather nonstandard for peopleworking in my field.I am trained as a computer engineer.So that literally means from transistors and programmingin assembly languages and operating system network,

  • 00:55

    MARIAN-ANDREI RIZOIU [continued]: really hardcore computer science.And then I got interested in datascience at the end of my master's,and I did my PhD in applied statistics and data science.And then it's when I moved to my current position in Australiawhere the team was already workingon modeling human behavior onlinethat I actually got interested.

  • 01:15

    MARIAN-ANDREI RIZOIU [continued]: I've always been interested in more sociology-and social-science-oriented problems,but this is when I actually had the chanceto marry the two, the computer science and the social sciencepart in some meaningful way and solveproblems that were meaningful.[What did you learn through changing your research focus

  • 01:39

    MARIAN-ANDREI RIZOIU [continued]: to computational social science?]I learned a lot of things which were unexpected.That's part of doing research.You start with that hypothesis, and sometimes youfind what you expect, but sometimes the datatells you the other way.

  • 01:59

    MARIAN-ANDREI RIZOIU [continued]: We did have confirmation for some of the thingsthat we expected, like people tend to flock around morepopular users.We all knew that from high school and later on.But sometimes they do behave--some of them behave differently, or sometimesthey behave different from what we expected.

  • 02:19

    MARIAN-ANDREI RIZOIU [continued]: The thing that I constantly learnis that the world is a lot more complex than our assumptions.So that's why we always build from simple to complex.That's one of the ways that I build my research.We have a simple hypothesis that explainsa big chunk of the variance.Once that is confirmed, we augmentit to make it more complex, and then we

  • 02:40

    MARIAN-ANDREI RIZOIU [continued]: explain more of the unknown, and thenwe continue building upwards.[What are you currently researching?]Recently I've been working on modeling the popularity

  • 03:00

    MARIAN-ANDREI RIZOIU [continued]: of online videos.Before I went into this work, there was a general perceptionthat popularity is completely unpredictable, more of,if you like, being the right person at the right timewe tend to say.The same thing with online videos.

  • 03:20

    MARIAN-ANDREI RIZOIU [continued]: Two identical videos made by the same peoplefeaturing the same cute dog, one might make it very popular.The other one just goes unknown.So there was this perception that popularityis completely unpredictable.When I started working on this, we

  • 03:41

    MARIAN-ANDREI RIZOIU [continued]: wanted to understand why it is like this,and is it really unpredictable or there mightbe multiple reasons for this?So my hypothesis at the beginning of the projectwas that there is, in fact, two things at play.

  • 04:02

    MARIAN-ANDREI RIZOIU [continued]: There is random, yes, luck, an effect of luck,but there is also an inherent qualityto content which is more like a potential of becoming populargiven the right person.It's almost like having two kids playing soccerin front of their house, but only one

  • 04:24

    MARIAN-ANDREI RIZOIU [continued]: is seen by a talent recruitment, and he makesit to be a professional player.The other guy still plays soccer in front of the house,but they're equally good.That was the kind of the hypothesis behind it.So the project was mainly about proposing theoretically basedmodels, mathematical models to separate and disentangle

  • 04:46

    MARIAN-ANDREI RIZOIU [continued]: this luck versus quality.That was my project, and we managed to do it.So we managed to construct these models,and now it's the state of the artin popularity modeling and prediction.[How did you collect the data for your study?]

  • 05:06

    MARIAN-ANDREI RIZOIU [continued]: Gathering the data for this projectwas probably one of the most difficult parts of it.When I arrived in my current position about four years ago,we wanted to have a mechanism wherewe get in contact with videos as unbiased as possible, so

  • 05:27

    MARIAN-ANDREI RIZOIU [continued]: without going towards just one type of video.So if you're just looking at, say,what is featured on the first page of YouTube,you only find the very popular ones or the onesthat YouTube wants to promote, but it'snot representative for popularity at large.So we wanted a more wide and unbiased methodto figure it out.

  • 05:47

    MARIAN-ANDREI RIZOIU [continued]: So what we ended up doing was continuouslyrunning for now more than 4 and 1/2 years,continuously running a Twitter crawler.So we listen to Twitter for tweets that contain linksto YouTube videos.So if you tweeted any video in the past 4 years and 1/2,

  • 06:09

    MARIAN-ANDREI RIZOIU [continued]: it's more than likely that we have it in our data set.So from there, we don't necessarilylook into what people say about the videos.We just use that as a way to grab the videos.So from there we extract the YouTube videosfrom the Twitter feed, and that's where we go to YouTube

  • 06:30

    MARIAN-ANDREI RIZOIU [continued]: and we extract the other data that we want to have.So this is the largest data set about YouTube videosand Twitter tweets about them outside of YouTube and Twitterthemselves.We're featuring billions of tweets, hundreds of millionsof videos and users.

  • 06:53

    MARIAN-ANDREI RIZOIU [continued]: That's the kind of data that we use.So it's very large data.[How do you get access to the data you need?]To get data, we've set the rule that we are onlygoing to use open data since we are researchers

  • 07:14

    MARIAN-ANDREI RIZOIU [continued]: in a public institution but also defendersof access and free access.So we only use public APIs.Yes, Twitter is free.They have an API, and also does YouTube.It's a bit less used in our communitysimply because it doesn't really exposethe interactions of users, so it's

  • 07:36

    MARIAN-ANDREI RIZOIU [continued]: a bit less useful when it comes to modelingcomputational-social-science problems,but they do have an API.So we use the only publicly available data.[How do you identify which tweets are relevant?]Whenever working with real data, there's a lot of noise.

  • 07:58

    MARIAN-ANDREI RIZOIU [continued]: There is a lot of noise.So that's the first rule that anybodyworking with real, human-generated online datalearns.Yes, there is a lot of irrelevant,but we do have the advantage of not necessarily having to lookin the text of the tweets.

  • 08:19

    MARIAN-ANDREI RIZOIU [continued]: But the query that we use to interrogate Twitteris linked to the presence of YouTube videos in it.So from that point of view, we haveit a bit easy because we talk to a machine, the Twitter API,asking for something which is very well defined.

  • 08:40

    MARIAN-ANDREI RIZOIU [continued]: The rate of errors is quite small.However, we do have a lot of problems.For example, before Twitter relaxed the 140 characterlimit, people would not post entire URLs of YouTube.They would shorten it down with shortening services.And what's more, they would shorten

  • 09:03

    MARIAN-ANDREI RIZOIU [continued]: shortened versions of links.So you would have to unshorten them a couple of times.These are all problems that we have seenand we have encountered.We mitigated those where it was--most problems can be solved, but the questionis also can we solve them with scale because wehave billions of tweets.

  • 09:24

    MARIAN-ANDREI RIZOIU [continued]: So sure, unshortening double-, triple-,quadruple-shortened URLs, it's doable.It also takes a lot of time.So what we finally decided to do,we studied how much of these do we have?We figured out it's less than 0.2% of our entire data.

  • 09:47

    MARIAN-ANDREI RIZOIU [continued]: And given the amount of effort and costrequired to treat this 0.2% of data,we simply decided to let it go because the impact isgoing to be minimal.And so where we can, we address.Where we cannot, we study the impact.

  • 10:09

    MARIAN-ANDREI RIZOIU [continued]: We evaluate what we lose.We estimate how much error we introduce.And if it's acceptable and assumed,then we just go ahead and remove it.[What do you do once you have collected the data?]Once you have the YouTube videos,once you have the YouTube and the tweet,

  • 10:31

    MARIAN-ANDREI RIZOIU [continued]: the first thing to do is actuallystore it in a meaningful way.And again when working with data at this size, the first thingwe learn, the first thing everybodyworking with data decides is that mostof the freely available and classical tools are unusable,

  • 10:53

    MARIAN-ANDREI RIZOIU [continued]: and even some of the more advanced tools dedicatedto deal with large volumes of databreak down after certain volumes.Just get slightly more technical,if you want to store data in relational databases,it is impossible at this level, but this is not new.

  • 11:17

    MARIAN-ANDREI RIZOIU [continued]: So there are solutions like NoSQL databaseslike Mongo or Apache.And these work quite well as long as, say,for a Mongo database, as long as the index fits in memory,which is quite a lot of data.When we did, actually the first infrastructure

  • 11:38

    MARIAN-ANDREI RIZOIU [continued]: to store the data was in a Mongo database,and it worked beautifully for the first six months.And I foolishly assumed it will continue to work,but after six months the index of the databasewas large, larger than the memory,and then we found out the limitations of Mongo.We have not yet found a solution, a long-term solution.

  • 12:01

    MARIAN-ANDREI RIZOIU [continued]: So we are still storing the data in raw format,and we simply go back to the datawhen we need to redo stuff.So it's complicated.Just going for the data once--if I want to search and see if I have one video in my data set,if I need to go sequentially through the data set,it might take us more than one day to just do one operation.

  • 12:23

    MARIAN-ANDREI RIZOIU [continued]: So planning ahead is very important.But let's assume that the data is stored.The next step is not related to data.The next step is to devise the modelsand the theoretical background to deal with the data.[How do you determine which computational models to use?]

  • 12:49

    MARIAN-ANDREI RIZOIU [continued]: When choosing models to deal with data,it is usually a process based on a number of factors--first of all, literature, readingwhat people have done before, what works, what doesn't work;gut feeling a lot.

  • 13:12

    MARIAN-ANDREI RIZOIU [continued]: Literature also has clues what could be used,what are the limitations, what are the power?So there is already hints in there,but at the end of the day researchis a bit of trial and error, so Ithink finding the models is also the same way.For us, I was reading at that time

  • 13:36

    MARIAN-ANDREI RIZOIU [continued]: one of the seminal papers--it was still quite new at the time--about these models that were comingfrom either applied statistics, economics, and thenthis particular flavor that we were usingwas actually coming from seismology.You say seismology and online media?What's the connection?I'm going to tell you the connection.

  • 13:57

    MARIAN-ANDREI RIZOIU [continued]: It's actually very simple and straightforward in hindsight.When the same model was applied to lookat how earthquakes occur--and there is this principle of self-excitement,meaning that you see an event, and this eventis likely to cause more events in the future.And for earthquakes, it's obvious, right?

  • 14:23

    MARIAN-ANDREI RIZOIU [continued]: You have a big earthquake, say, in Japan,and then we all know that there will be a series of aftershocksrelated to this earthquake becauseof the initial earthquake.So we can model with the stochastic tools.We can model exactly this sequence.So then we told ourselves, what if the events are notearthquakes?What if the events are people watching YouTube videos?

  • 14:46

    MARIAN-ANDREI RIZOIU [continued]: Say I watch a YouTube video.I like it.I'm going to share it on the social media with my friends.So I'm going to cause another viewing event by my friends.And they're going to watch it, and they're going to like it,and they're going to share it with their friends.So the same underlying mechanism would propagate a YouTube video

  • 15:07

    MARIAN-ANDREI RIZOIU [continued]: the same way as earthquakes occur.So we took models that were designed for earthquakes,and we adapted them to the reality and the datathat we had, and we applied them.[What is the process of applying those models?]

  • 15:31

    MARIAN-ANDREI RIZOIU [continued]: Because we are a computer-science departmentand because my background is from pure programmingand engineering, I usually feel morecomfortable being in control of all the little variablesof my modeling, which means I didn't use anything.I developed from scratch the models.

  • 15:52

    MARIAN-ANDREI RIZOIU [continued]: There are libraries, but usually whenyou're working so close to the edge of knowledgeand what is being done, most of the toolstend to be very rough prototypes developed by other researcherslike me to do their own work.So they tend to be very adapted to their particular needs,usually not scalable, full of edge cases that

  • 16:17

    MARIAN-ANDREI RIZOIU [continued]: didn't matter necessarily in their researchbut might be very important in my research,and in most cases unscalable to the volumes of data that I had.So what I ended up doing, I went and Iimplemented the generative processes from scratch.Which the math is complicated, but the implementation finally

  • 16:37

    MARIAN-ANDREI RIZOIU [continued]: is not that complicated because it's all about temporal series.So using very simple structures like vectorsand for loops very basic computer science,it's possible to get decent implementations.The math is complicated.The implementation itself is not.So I decided to redo the whole thing.

  • 16:59

    MARIAN-ANDREI RIZOIU [continued]: [How did you "train" or organize the data to prepare itfor the models?]The data comes in a raw format, whichmeans it's most often unusable with the models.It always needs to be cleaned, extracted.

  • 17:20

    MARIAN-ANDREI RIZOIU [continued]: The first step is usually selecting what part of the datawe need.This goes back to the drawing board.When I said that we took models from, say, seismology,the process to do it is sitting downand thinking what drives human sharing of content--

  • 17:44

    MARIAN-ANDREI RIZOIU [continued]: finally, a very mechanistic approach.So then we went again.We looked in the literature, and we figured outthat there is something called preferential attachment whereusers tend to retreat or share more of the content sharedby locally influential users.So we went like, how can we measure that?

  • 18:05

    MARIAN-ANDREI RIZOIU [continued]: And then we figured out that we need the number of followers.So we went in the data and we grabbed that particular field.What else do you need?We know that users tend to reshare fresh content.Nobody shares that old cat video from three years ago.No, everybody is sharing the current one.Therefore you need the timestamps,and we went and we took that information.So we constructed this minimum set in a bottom-up approach.

  • 18:30

    MARIAN-ANDREI RIZOIU [continued]: And then we extract from the raw data.We extract only the fields that we required.And we put them in the machine-readable format,which is usually tables, lists, thingsthat don't easily make a lot of sense for the regular user.But then even the tweets--you see it in the Twitter interface,and I see it in the raw format.

  • 18:52

    MARIAN-ANDREI RIZOIU [continued]: It's not at all the same format.So we didn't lose much.So we pretreated this data, and then that'swhat we used afterwards.[Were there any adjustments you needed to maketo your research?]

  • 19:12

    MARIAN-ANDREI RIZOIU [continued]: Sure, there's always iterations.It never works from the first time.It's an incremental process.It's a trial and error, and that'sthe whole thing of research.It's trial and error.Not all of the things that I told you about earlierwere actually in the initial version.

  • 19:33

    MARIAN-ANDREI RIZOIU [continued]: We didn't know about, for example,preferential attachment at the beginning.We added it later on.And I told you that we first build based on our intuitions.We build the modeling, and then we grab the dataand we try it out.Sometimes it's just not enough, or some fields don't work.

  • 19:56

    MARIAN-ANDREI RIZOIU [continued]: I don't have an exact example right now,but I do remember that initially wepulled in fields that was based on my intuition of how peoplewould diffuse, and it turned out not to work at all.So usually this process is repeated a couple of times,and a couple is not just two or three.It might be up to 10 times going back to the drawing board,

  • 20:19

    MARIAN-ANDREI RIZOIU [continued]: reintroducing new features or removing old features,and then trying it again.And the whole system, the whole progressis done in a very rigorous fashion.So it's not like intuition.I try it on one video and it doesn't work.That's not a scientifically sound way of doing it.

  • 20:41

    MARIAN-ANDREI RIZOIU [continued]: Whenever we are doing incremental work,we need to show that what we're doing actually helps.So let's say I want to introduce a new feature.Let's say that I have a new idea or that another user-relatingrelating feature, say for how long he's been active,

  • 21:03

    MARIAN-ANDREI RIZOIU [continued]: is important in some way in the diffusion.So I'm pulling this feature in.I'm reconstructing my data.I'm testing on a large sample usually.What would be a large sample?Maybe a couple of thousands of videos would be enough.So now I have the model trained on the same data setwith the feature and without the feature.

  • 21:25

    MARIAN-ANDREI RIZOIU [continued]: My hypothesis is the feature helps.And now I model both with and without,and let's say I have a quality measure.I always have a quality measure that I optimize for.This quality measure can be eitherhow well do I explain the data I see, which in this casewas a simple loss function, or I could try and predict

  • 21:50

    MARIAN-ANDREI RIZOIU [continued]: future of values.And in this case I train my modeland I make future predictions, and then Icompare my prediction with my observed data.Based on this for, say, the thousands of videos,I can compute error measures, which basically tell mehow wrong am I both in explaining or in predicting.

  • 22:13

    MARIAN-ANDREI RIZOIU [continued]: And then going back, I can say how wrong I am with the featureand I can say how wrong I am without the feature.Now you can see how is the statistically sound waybecause it's such a large sample of videosthat if I systematically decrease my error

  • 22:34

    MARIAN-ANDREI RIZOIU [continued]: measure with the feature, it means that the feature helpsand I'm keeping it.If I see nothing statistically significant,then I might drop it.[What was the process of running the models like?]Again, this is an iterative process.

  • 22:59

    MARIAN-ANDREI RIZOIU [continued]: The stopping condition of the iterationis not clear, especially if we keep on improving things.You never finish exploring all the possibilities, whichmeans that every single work of a researcheris still a work in progress.Even when we publish the papers at the end

  • 23:20

    MARIAN-ANDREI RIZOIU [continued]: in the big conferences and we present them,we always have, on the last portion of our presentations,what are the limitations?What are the ways to go forward?And that basically shows that it's never done.This is one of the very important pointsthat most juniors in research need to learn the hard way.

  • 23:41

    MARIAN-ANDREI RIZOIU [continued]: It's never done.However, when the results tend to be significant enough--this is, again, there is no clear measure of whatsignificant enough means.It's, again, gut feelings and experiencein publication process.Then what we usually do, we run it at scale.I didn't run on only a thousand videos.

  • 24:03

    MARIAN-ANDREI RIZOIU [continued]: I have hundreds of millions of videos.So then once you have a version that is stable enough,you just run it on the entire collectionto try to draw conclusions.In this case, we had to use a supercomputer.So we had access to the Australian NationalComputational Infrastructure, whichis Australia's supercomputer, one

  • 24:24

    MARIAN-ANDREI RIZOIU [continued]: of these massive, massive computers,and we just launched a lot of jobs in there.Just as a funny note, if I would haverun on my computer all the experimentsthat I ran on the supercomputer, itwould have taken me 37 and 1/2 years to run the experiments.

  • 24:47

    MARIAN-ANDREI RIZOIU [continued]: But because it was on a supercomputer,it only took a matter of maybe a week, week and a bit.So that basically means training the models over the entire dataset or a significant sample of the data setand then getting fitted models, which means explanatory models.

  • 25:08

    MARIAN-ANDREI RIZOIU [continued]: Now what can you do with this?As I said, you can either explainwhat you observe or you can try to do predictionsin the future, and that's exactly what we did.We started doing predictions about the future.And then we can look at things of howmuch of the future popularity can we explain?

  • 25:29

    MARIAN-ANDREI RIZOIU [continued]: But not only that, remember that my objectivewas to disentangle luck from quality.My end goal was getting the quality.So the same models, the advantageof them being theoretical models isthat we can derive quantities based on them.

  • 25:51

    MARIAN-ANDREI RIZOIU [continued]: And one of these quantities was roughly saying,was roughly quantifying the number of viewsI get for one initial view.So let's say I want to compare two videos.One is a cat video and one is a dog video,and you want to know which one has the betterpotential of becoming viral.

  • 26:13

    MARIAN-ANDREI RIZOIU [continued]: One way of quantifying this is wouldbe if I start one share of the cat videoand one share of the dog video in an otherwiseidentical setup, which one would gather more views at the end?So the way you do it, being a generative model,you put the first event, and then

  • 26:33

    MARIAN-ANDREI RIZOIU [continued]: it will continue generating new events within the video.And at the end you count, and that's your notion of quality.So with this we actually generated a 2D visualization.You have a little map, and on the horizontal axisyou put this quality.On the vertical axis you put some notion of luck.

  • 26:57

    MARIAN-ANDREI RIZOIU [continued]: And this map actually tells you whichare the videos which are more likely to be viral.And if you're looking in the corner that is high qualityand you give enough luck, those arethe videos you want to look at.And from there, actually the analysiscan spawn quite a lot because you might ask questions

  • 27:18

    MARIAN-ANDREI RIZOIU [continued]: like are music videos more likely to havehigher quality than cat videos?Is this particular type of music better than the other one?Is the content produced by this singer better than the contentor more likely to propagate than the contentof the other singer?And you can imagine from here the ramifications are quite

  • 27:42

    MARIAN-ANDREI RIZOIU [continued]: large, and we have explored quite a number of thesein our subsequent work.[What is next for this research project?]What brought the project all the way here,this theoretical models and this explainability

  • 28:03

    MARIAN-ANDREI RIZOIU [continued]: and being able to derive new measuresis what's currently keeping it back.Why?The generative models are good at explaining.They're usually not that great at predicting.Why?Because as I build my assumptions into the model,

  • 28:25

    MARIAN-ANDREI RIZOIU [continued]: I will never be complete enough to takeinto account all the phenomena that is happening.There will always be limitations.So I can explain a big portion of it, but never good enoughto actually go and predict that well in the future.However, we do have another set of toolswhich are black-box tools.And maybe many of the audience might know about neural

  • 28:48

    MARIAN-ANDREI RIZOIU [continued]: networks, recurring neural networks,and all this new machine-learning algorithmsthat are having amazing prediction properties,but they cannot explain.They work as black boxes.So we are left with two choices.I have something that explains but doesn't predict well.I have something that predicts amazingly but cannot explain.

  • 29:12

    MARIAN-ANDREI RIZOIU [continued]: The current work is to mix the two approachesso that we have something that explains and predicts wellin the future.That's where it's going right now.[What advice would you give someone interested in doingthis kind of research?]

  • 29:32

    MARIAN-ANDREI RIZOIU [continued]: There are many things that researchersthat want to get in this field should be aware of.But I think that if I have to distill it and put one in frontwould be we don't do research from data.

  • 29:54

    MARIAN-ANDREI RIZOIU [continued]: We use data to prove our hypothesis and our ideas.The point of research is discovering thingsthat are beyond the data.You always start research with an idea.In my case, the idea was that we can disentangle luck

  • 30:16

    MARIAN-ANDREI RIZOIU [continued]: from quality.Using YouTube data is secondary.It means that that's the best kind of data or the most easilyavailable data out there for proving the research that Iwant to do.But when doing social-science-related datausing computational means--

  • 30:39

    MARIAN-ANDREI RIZOIU [continued]: and I think applies to the entire research problems--you don't start from data.You start from the research question.So I guess the recommendation will be--well, more like a question, not a recommendation.The question that I ask my students all the time,

  • 30:60

    MARIAN-ANDREI RIZOIU [continued]: what do you want to solve, explicitly?Put it very simply, no jargon, no hand waving.What is the problem you are solving,and what is your hypothesis?Once you can do that, then we go and we crawl the data,and we have fun.[Recommended Reading-- Rizoiu, M., Xie, L., Sanner, S.,

  • 31:22

    MARIAN-ANDREI RIZOIU [continued]: Cebrian, M., Yu, H., & Van Hentenryck, P. (2017).Expecting to be HIP--Hawkes intensity processes for social media popularity.Proceedings International Conference on World Wide Web.Rizoiu, M. & Xie, L. (2017).Online popularity under promotion--Viral potential, forecasting, and the economics of time.Proceedings of the 11th International AAAI Conferenceon Web and Social Media.]


Marian-Andrei Rizoiu, PhD, research fellow at the College of Engineering and Computer Science at the Australian National University, discusses using stochastic computational models to study the popularity of online videos, including a focus on computational social science, current research, data collection and access, identification of relevant data, how data is prepared and processed, determining the computational models to use and applying them, training or organizing the data, adjustments to the research, running the models, next for this research, and advice for others interested in this type of research.

Video Info

Publication Info

SAGE Publications Ltd
Publication Year:
SAGE Research Methods Video: Data Science, Big Data Analytics, and Digital Methods
Publication Place:
London, United Kingdom
SAGE Original Production Type:
SAGE Case Studies
Copyright Statement:
(c) SAGE Publications Ltd., 2019


Marian-Andrei Rizoiu

Segment Info


Segment Num: 1


Segment Start Time:

Segment End Time:


Things Discussed

Organizations Discussed:

Events Discussed:

Places Discussed:

Persons Discussed:

Methods Map

Computational modeling

Using computer simulated models to study behaviour through adjusting particular variables and observing how these changes affect the outcomes.
Computational modeling
Studying Online Video Popularity with Stochastic Computational Models

Marian-Andrei Rizoiu, PhD, research fellow at the College of Engineering and Computer Science at the Australian National University, discusses using stochastic computational models to study the popularity of online videos, including a focus on computational social science, current research, data collection and access, identification of relevant data, how data is prepared and processed, determining the computational models to use and applying them, training or organizing the data, adjustments to the research, running the models, next for this research, and advice for others interested in this type of research.