Skip to main content
SAGE
Search form
  • 00:01

    MATT DENNY: Hi, everyone.This is your instructor, Matt Denny.And welcome to this lecture on Processing Data with Loops.[What This Lecture Covers]So, as I alluded to in the last lecture,we're going to be reading in dataand then actually doing the stuff of managing it--actually getting in there and altering it or setting it up

  • 00:22

    MATT DENNY [continued]: so that we can use it for analysis.And we're going to be trying to do this in a processwhere we read in each dataset using a loop,and then we automatically process those datasets.So the goal here is to set things up so that whatever codeyou write-- the R code that you write--

  • 00:43

    MATT DENNY [continued]: can sort of adapt to any differencesacross different datasets in a way that it can actuallyautomatically do that process of, whether it'scleaning your data or reformatting your data,or in the case of the data that I'vebeen working in the example, actually turning itinto something different.And again, we're going to be doing this all using loopsand conditional statements.

  • 01:04

    MATT DENNY [continued]: Now, I wish I could tell you that there was one setof functions or one particular approach that was goingto always and exactly do whatever it was that you needit to do every single time that you need to read in multipledatasets, but there really isn't.This is where the creativity, and in some sense,the fun of managing multiple datasets and data

  • 01:26

    MATT DENNY [continued]: management in general comes in.It's in this sort of process of trying to figure out, OK,for the particular collection of datasets or justthis particular single dataset that I'm working with,how do I actually transform it in such a way,clean it up in such a way, do whateverit is I need to do to it in such a waythat I can actually then use it for my downstream goals.

  • 01:47

    MATT DENNY [continued]: So let's get into what we're goingto do with the co-sponsorship network datathat I've been working with in this example.Let's actually go into how I would actuallymanage these data.OK, and so I'm going to head over to my desktop.And we're going to open up RStudio.OK, so we're over here on my desktop.

  • 02:08

    MATT DENNY [continued]: Again, we're working with the managing_multiple_datasets.Rscript and this Data folder.And I'm going to pick up where we left off.So we're now down around line 64.And we're going to actually talk about processingthese multiple datasets that we storedin our co-sponsorship_data list object, whichhas 11 entries, each of which is itself a list with one entry.

  • 02:30

    MATT DENNY [continued]: And that's called raw_data.OK, so again, it's going to be useful to step insideof this loop.So right now, my little i value is equal to 11because up above here, I just finishedreading in 11 datasets.I looped through the file names.So I want to reset i equal to 1 just for this example.

  • 02:51

    MATT DENNY [continued]: i equals 1.And I want to talk you through what's goingto go on inside of this loop.So again, I'm going to be looping over our filenames,so that'll make sure that even if I wereto change the number of files that I was working with,I could have just said here for i in 1 to 11,because I know that there are 11 different files that Iwas working with--11 different datasets.

  • 03:12

    MATT DENNY [continued]: But I'm going to choose to always use this one for i in 1to link the filenames, so that if I wereto change the length of filenames,if I wanted to ignore some files, that this for loopwould automatically adapt to it.It's just sort of a good housekeeping practice.

  • 03:33

    MATT DENNY [continued]: OK, so we're going to use our cat statement here.Let's start off inside of this for loop.We've set i equal to 1.And we are going to remind ourselves that we're currentlyworking on dataset number 1.OK, now I'm going to sort of do things in reverse.Whereas up above I created this temp file,and then I assigned it into the raw_data slot

  • 03:53

    MATT DENNY [continued]: in co-sponsorship_data, down hereI'm actually going to extract outthat raw_data slot in the sub-listof co-sponsorship_data.So we're going to take the i-th entry in co-sponsorship_data,the first entry, and get out the raw_dataslot from that sub-list.And we're going to assign it back to temp.

  • 04:13

    MATT DENNY [continued]: So right now, what temp is, is it's the stuff thatwas stored here in this first entryin our co-sponsorship_data list.OK, now once we have out temp, one thing we're going to do,is we're going to need to figure out the number of senators.So this will actually change from dataset to dataset.Sometimes there will be 100.Sometimes there will be 101, if, for example, one

  • 04:34

    MATT DENNY [continued]: retired or unfortunately passed away midway through,or even 102 or 103.So there can be different numbersof senators in each session of Congress,depending on what happened to them or if they were replaced.So, to figure out the number of senators,we're going to take the number of rowsof temp, which in this case is going to be equalto 101, right, because we see that there are 101

  • 04:57

    MATT DENNY [continued]: observations of 100 variables.So only one senator had to leave office early for some reason.And then someone else was brought in to replace them.And so they get their own name in this dataset.I'm now going to create something called a sociomatrix.And I'm going to call it temp_sociomatrix.And what that is, is it's just a matrix full of 0sthat has an equal number of rows and columns,

  • 05:19

    MATT DENNY [continued]: where each of the rows and columnsis equal to the number of senators.And so when you're dealing with social network data,what a sociomatrix is essentially--let's click on it here.So for example here, if I had the first row, second column,

  • 05:40

    MATT DENNY [continued]: in the co-sponsorship case, whatever gets recorded in herewill tell me how many times senator1 co-sponsored a piece of legislationintroduced by senator 2.So ties in this network representation,if we think about it as a matrix, are sent from the rowto the column.So if we see a lot of large numbers in row 1,

  • 06:06

    MATT DENNY [continued]: then that would mean that senator 1, whoever that is,tends to be offering lots of supportto lots of different other senators.And in general, we don't expect to see any entriesin the diagonal, so 1-1 or 2-2 or 3-3.You don't usually offer support to yourself.There are some instances where we see self ties.But in this case we're going to be ignoring them

  • 06:28

    MATT DENNY [continued]: because you can't co-sponsor your own piece of legislation.So this is the thing that we want to fill in.So what we want to do is, essentiallywe want to go and visit each bill.And then we want to figure out who the sponsor wasand who any of the co-sponsors were.And then we want to increment the countsof the number of times that those co-sponsors co-sponsoreda bill by the sponsor.Now I just used a whole lot of spaghetti soup of words,

  • 06:51

    MATT DENNY [continued]: but let's actually talk through this.Let's talk to this sort of in practice, more concretely.So first off, now we're going to use a j for loop.So what we're going to do is we'regoing to loop through the number of columns.So we're going to loop through 1 to the number of columns

  • 07:11

    MATT DENNY [continued]: in temp.That means for every bill, because if weremember in our dataset, each column is a bill.And so we want to treat each bill separately.So that's going to be our next loop.And remember, since we already used i,we can't use i again in this loop.Because otherwise that'll do some weird stuff resetting it.And we don't want to go there.So what we want to do, is we wantto say, OK, for j in number of columns of temp,

  • 07:33

    MATT DENNY [continued]: where temp is this thing that we extracted--it's the i-th entry in co-sponsorship_data,the raw data that we're working with.So let's set j equal to 1.So again, you can use either equal-to sign or the assignmentoperator.I should be proper and use the assignment operator.You can also use the equal operatorif you're being sloppy.So now we're going to loop through each bill.

  • 07:56

    MATT DENNY [continued]: So for example, right now, we're goingto only be considering this first column here.And it doesn't look like there are any co-sponsors there.So now what we want to do, is we'regoing to use another loop for k in 1 to number of rows.So we want to loop over each senator.And if temp kj, so if the k-th row and the j-th column--

  • 08:19

    MATT DENNY [continued]: so for example, right now we're goingto be bopping down one senator at a time.That's our k.And then we're going to be stickingwith the same j-th column, so the same bill.If that entry equals 1, then we want to create this variablecalled sponsor and set that equal to k.And so let's run this sort of loop.And what's going to happen--so if we run this little line of code-- is we're

  • 08:40

    MATT DENNY [continued]: going to see that k went all the way through to 101,but the sponsor stopped at 14.And if we go into our temp dataset,we're going to be able to see that the 14th row, there isa 1 there.And so that's what happened, was that our if statementworked because it said, hey, OK, I

  • 09:01

    MATT DENNY [continued]: found the 14th entry in this column, the 14th row associatedwith this column.And it was equal to 1.So I'm going to stop there.And I'm going to set my little sponsor variable equal to K.And then I'm going to keep going,and none of the other ones are goingto equal 1, because there can be only one sponsor per bill.That's one of the nice things about these data.

  • 09:22

    MATT DENNY [continued]: So now we figured out who the sponsor is.Now we want to go through.And so now we're going to loop through again.We're going to go through again for each senator.But now we're going to look for oneswhere the k-th row in the j-th column is equal to 2.And that would indicate a co-sponsor.Now, for our first bill, there aren'tgoing to be any co-sponsors.But later down the line, we'll see that there are co-sponsors.

  • 09:44

    MATT DENNY [continued]: And what we're going to want to do if we find one of those,is we're going to want to send a tie.We're going to want to increment the tie count, or the numberof times senator K co-sponsored a piece of legislationsponsored by sponsor.

  • 10:04

    MATT DENNY [continued]: So for example, in our case of our temp,let's see, so we're dealing with the first column here.So it's the 14th senator is the sponsor.So that would be the 14th column.And then, if there are any co-sponsors,they would be the k-th row.

  • 10:25

    MATT DENNY [continued]: And we want to increment that count.So we're going to want to assign into this slotin our little temp sociomatrix, which again, is this currentlyblank 101-by-101 matrix.We're going to want to take it and assign back to itit's current value plus 1.So let's run this little line of code.

  • 10:46

    MATT DENNY [continued]: And we're not actually going to get any hits here.So if we were to go into temp_sociomatrix,we could scroll down the 14th column.And we would see that there were no entries there because therewere no co-sponsors for this bill.

  • 11:06

    MATT DENNY [continued]: So this little loop here, this j loop,it's going to loop over all the columns.So it's going to go over each bill.And we know by looking at temp that thereare going to be some bills where bills were sponsoredand then they had co-sponsors.So there's actually information that we'regoing to be storing here.And we're going to assign to another slot.And remember, we can use the dollar sign operator

  • 11:27

    MATT DENNY [continued]: to just add stuff into a list.We're going to assign to that temp_sociomatrix.So let's run this big loop here.Actually, let's run this j loop first.So let's run this whole thing.So I'm going to run this.And it runs pretty quickly.And let's look at temp_sociomatrix.

  • 11:48

    MATT DENNY [continued]: And what we're going to see here,is now we start to see these 1s and 2s and 3sand so on popping up.And these are recording the number of timesthat, for example, we see here, senator17 co-sponsored a bill introducedby senator 24 three times.And in fact, if we look at Senate 24,it looks like they've got a lot of peopleco-sponsoring their legislation.So maybe they were popular or important or somehow

  • 12:11

    MATT DENNY [continued]: like one of the key figures in their party.OK, so the last thing left to do--excuse me, let's exit out of here--is to run our entire loop.And now we're going to be doing thisfor each of these 11 datasets.So let's run this here.So I'm going to run this loop.And we're going to see that this is moving along pretty quickly.

  • 12:31

    MATT DENNY [continued]: If we hadn't subset our data down to only 100 bills,it would take a lot longer here because itwould have to go through each of the 7 or 8or 10 thousand bills introduced in each session of Congress.OK, so what we've done, is we've essentiallycreated a whole bunch of these temporary sociomatrices.And we've now created these datasets

  • 12:54

    MATT DENNY [continued]: that were something different than the data that we read.And we sort of transformed them.And we've done that using a loop.So if you want to actually understand what happened here,like, what did we actually do?Well, I've given you a little bit of codeto actually visualize this.So, to do that, we're going to need the stat-net package.So what this is, is this is a package

  • 13:14

    MATT DENNY [continued]: for visualizing and analyzing network data.And so I've already downloaded this.So I'm just going to load in this library.And as you can see it, it's goingto print out just an ungodly amount of information.Ooh, it says, hey, you should get a new version.Well, I'm not going to get a new version currently.

  • 13:34

    MATT DENNY [continued]: But it's going to give us informationabout all the different packages thatgot stuck in with this package.This package is sort of the one package to rule them, right.So it's a huge package.I keep doing Lord of the Rings references, I'm sorry.But it's going to be this one big package that'sgoing to give us a whole bunch of functionality.And so I'm not going to get into too much here.

  • 13:55

    MATT DENNY [continued]: But we're going to create this function called net.plot.And what that's going to do, is it'sgoing to transform our data list,in other words, co-sponsorship_data.It's going to transform the sociomatrix that we stick outof there into something called a networkobject using the network package,which is part of statnet.And then it's going to plot that network,

  • 14:17

    MATT DENNY [continued]: and it's going to change the color.So you can give R colors using actual names like red and blue,but you can also give R colors using numbers.And then it's going to have this funny little sys.sleepfunction, which says, don't do anything for-- in this case--one second.We could change this from one secondto two seconds or whatever.So again, another very handy function.

  • 14:38

    MATT DENNY [continued]: I'm not going to spend a lot of timeon what this function does, but I'm going to define it now.So we can see it's going to pop up down here.And what this will do, is we're going to apply it.So we're going to use net.plot.We're going to apply it to each of the 11 datasetsthat we're working with.And it's going to create this colorful--they're called spaghetti plots sometimes, or network plots--

  • 15:04

    MATT DENNY [continued]: of this data that we actually read in and transformed.And I want to just show you what it's actuallygoing to look like.So I'm going to run this little loop.And what we're going to see is that we'regoing to get these sort of different network plotsout each time.And what these are, these are the co-sponsorship networksthat were created out of that plain data.And we've plotted something.

  • 15:24

    MATT DENNY [continued]: It's fun.We used the plot function.We might touch on this some more later in this course.And certainly, there's a ton to learn about plotting.But what we can see if we zoom in on this plot for example,is that we see these little arrows.And what they're doing, is they're recordingwhether or not there was--in this case, it's just these data had been binarized-- so

  • 15:46

    MATT DENNY [continued]: just, was there a tie or not?We could think about the arrows herehaving widths that talk about how many timeswas there co-sponsorship event.So we can see that this senator over hereco-sponsored a piece of legislationintroduced by this senator.And we can see sort of these patterns of some senators hada lot more legislation co-sponsored than others.So I actually used this code-- this code is ripped.

  • 16:10

    MATT DENNY [continued]: So the code up here is ripped right outof my actual research, where I've actuallyworked with these co-sponsorship network data.And this was the code I actually use to read it in.It made it a whole lot easier for me.I could go through just massive amounts of informationfrom these bills, automatically read it in, and sort ofcombine it and make it useful.Now, I did more than just plot these data,

  • 16:32

    MATT DENNY [continued]: but I just sort of wanted to show youwhat we could do with for loops and if statements and listobjects.So that's all for this lecture.Now we're going to cover many other aspects of managing data.But I wanted to start us out with an interesting example,creating network data and doing so using for loopsto manage these multiple different datasets.

  • 16:53

    MATT DENNY [continued]: So, thanks for watching, and I will see youin the next lecture.

Video Info

Series Name: Practical Data Management with R

Episode: 32

Publisher: SAGE Publications Ltd

Publication Year: 2017

Video Type:Tutorial

Methods: Data management, R statistical package, Social network analysis

Keywords: computer programming; data analysis; data management; data processing; data visualisation; looping; matrices; programming and scripting languages; Social network analysis ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

Abstract

Using a sociomatrix with real-world social network data, Matt Denny explains how to use loops to read in multiple data sets, automatically process those data sets, and plot the results.

Looks like you do not have access to this content.

Data Management: Processing Data with Loops

Using a sociomatrix with real-world social network data, Matt Denny explains how to use loops to read in multiple data sets, automatically process those data sets, and plot the results.

Copy and paste the following HTML into your website