- 00:09
ANDY FIELD: All right.This podcast is going to have a lookat SPSS's automatic linear modeling thing-- which I can'tremember what it's called.I don't really go into this in the book,because essentially I think it's a really bad idea.But so I thought I'd do a webcast instead,because, you might disagree with me

- 00:30
ANDY FIELD [continued]: and might think it's a really good idea.And then, you know, I get loads of emailscomplaining about what I haven't talked about in the book.And so-- webcast.That's the way forward.So what I'm going to do is have a look at the automatic linearwhatever by using the example from chapter 8, whichis a regression example looking at predicting album

- 00:53
ANDY FIELD [continued]: sales from things like amount of money spent on advertising,the number of times songs from the album are played on radio,and the attractiveness of the band.So, by following through the example,you can compare what you get from the automatic lineargadget-- see what you get if you kind of do it manually

- 01:15
ANDY FIELD [continued]: like it tells you in the book.OK, so, first things first, just to remindthose of you that have the book, data look like this.So each row in the data editor represents an album,essentially.So we've got information about how much moneywas spent advertising it.And that's in thousands of pounds, because I'm British.

- 01:37
ANDY FIELD [continued]: So I like to work in pounds.We've got a number of album sales.That's in thousands.So the first album sold 330,000 units-- downloads, CDs,whatever.Airplay is just the number of timesit's played on radio during a week-- a one-week period.And attractiveness of the band is just that our rating

- 01:60
ANDY FIELD [continued]: scale out of 10.So, if you want to use the automatic gadget thing,the first thing is, when you set up your variables-- so we'vegone to the Variable view now.You have a look in this final column.So the Variable view, we've set uploads of properties of the variables-- just given themnames and stuff like that.

- 02:21
ANDY FIELD [continued]: And at the end there's this thing called Role.And what this does is allows you to set up whattype variable of you're using.And essentially one of the main purposes this Role tagis, when you use some of the automated procedures in SPSS,it uses this information about the role of the variable

- 02:43
ANDY FIELD [continued]: to kind of second-guess what you might want to do.So, for example, our advertising budget variable,I've set that as an input.So it gets this little blue arrow.So I've set it as an input.And what that means is basically I'm setting it as a predictor.But SPSS calls it an "input," rather than a "predictor."So it's kind of an input into the model.

- 03:07
ANDY FIELD [continued]: Whereas you have things called "targets."So this is something that, for example, you'retrying to predict.So it's an outcome.But SPSS called it a "target."You can set variables to be both.You know, maybe in some analyses you want it to be a predictor,in others an outcome.And you can set it to be none, as well.So, neither of those.

- 03:27
ANDY FIELD [continued]: But that's probably not going to be very helpful to you.So we've got advertising set as an input,because that's a predictor variable.Now the thing we're trying to get is sales.So you'll notice, in our list of variablesI have set sales to be a target, because that's the thing we'retrying to predict.Now if the automatic linear model to-- uh--

- 03:48
ANDY FIELD [continued]: "linear model"-- I can't even remember what it's called,that's how little I use it-- to actually work,you need to set up these roles, and you need to set them upcorrectly.And if you set them up wrongly, then SPSSwill try to guess what you're doing,and it will get it wrong because it's usingthe wrong information to guess.So there you go.That's how you set up your data.

- 04:10
ANDY FIELD [continued]: Now if you then want to basically hand over your lifeand research career to SPSS, ratherthan using the very able and capable brain that'sin your head, then you go to the Analyze menuand into the Regression menu.And here we've got this automatic linear modeling.

- 04:32
ANDY FIELD [continued]: Now, by clicking on that, you'll open up a dialog boxthat looks a bit like this.And, first of all, it should start up in this screen.So there's a little tab at the top, here, for Fields.And automatically-- that doesn't mean you get lots of me,by the way, which will be a relief to all concerned,

- 04:53
ANDY FIELD [continued]: I'm sure.By default, it will use predefined roles.So that means it uses that information in the Roles columnin the Variable view.It's trying to guess what you want to do.So it's had a guess, and it's guessedthat we want to predict album sales from our threeinputs-- our adverts, airplay, and attractiveness.Now that is kind of what we want it to do, actually.So, by setting our roles up, we've

- 05:15
ANDY FIELD [continued]: helped SPSS to basically, by some kind of weird telepathy,work out what model we want to fit.However, we've got a very simple data set here.We've got only the variables in the data set that we actuallywant to use, so it's quite straightforward for SPSSto kind of guess what we might want to do.You might find that, if you've got bigger data sets, lots

- 05:35
ANDY FIELD [continued]: of variables, and you're doing slightly morecomplicated things, then basically SPSSwill guess incorrectly.So, if that's the case, you can alsotick on this "Use custom field assignments," whichallows you to select variables.So you can say you want to target to be sales and drag itover here and you want your predictors or inputs to be

- 05:58
ANDY FIELD [continued]: these three variables, as well.So you can do it all manually, if you want.But by default SPSS-- you know, it'skind of wanting to show off that it'svery good at reading your mind.So it'll have a guess.So that's it.I mean, basically, if you set up your roles properly,and you don't have loads of other spurious variablesin Data Editor, you could literally just, like,

- 06:20
ANDY FIELD [continued]: click on Run, and SPSS will run the modelthat it thinks you want to run.However, there are some other options, so,in the Build Options basically by defaultit will just fit a standard regression model.And that's probably fine for most people.But you can do slightly more sophisticated things.So for example, you can use this thing called "bagging."

- 06:41
ANDY FIELD [continued]: And that basically kind of selects a modelby using bootstrapping.So it kind of bootstraps a sample, fits a model,then bootstraps another sample, fits a model,then bootstraps another sample, fits a model.And then it sort of amalgamates these--I think it uses 10 models-- into your final model.And that can be good for getting parameters or base values

- 07:03
ANDY FIELD [continued]: and things like that that are quite stable.But, to stick with the book chapter,we're just going to run our standard model.Other things you can do is, by default--I would generally turn this off, to be honest,but anyway-- by default, you can ask SPSS to automatically

- 07:24
ANDY FIELD [continued]: prepare your data for you.Now two things particularly important that'll do.The first is it will decide how to deal with missing values.And that can be a dangerous thing,because it will do things like either use-- I thinkon categorical variables it will use the mode,and on continuous variables it uses the median.So it just replaces missing values.

- 07:46
ANDY FIELD [continued]: And there's lots of reasons why you might notwant to replace missing values with anything, frankly.It could be worse-- it could use the mean.But um-- not necessarily.You know, this is the sort thing that Ikind of think a human needs to be doing, not a computer.Like the decision-making a human should be doing.It also deals with outliers.

- 08:06
ANDY FIELD [continued]: And the way it deals with them is it will,if there are any outliers, kind of more--I think it's more than three standard deviationsfrom the mean-- it just sort of chops them offIt trims them replaces them with the highest value.And again, that is kind of fine.It's not that that's an intrinsically daft thing to do.But I kind of think you should be exploring your outliers

- 08:28
ANDY FIELD [continued]: and trying to work out why there are outliersand making informed decisions about the best wayto deal with that.So, you know, is it-- do you want to trim it at the maximumor trim it at three standard deviations above the mean,or do you want to do something else with it.Do you want to sort of transform your data or whatever?So, personally, I would switch this off.

- 08:50
ANDY FIELD [continued]: But I guess if you're going down the routeof automatic linear modeling, you probablydon't necessarily want to think about whatyou're doing too much.Model selection.And we use stepwise methods.I'll go into a massive, pathetic rant about why stepwise methodsare a bad idea.And yet that is what the automatic modelwill use by default.You can change this.You can use forced entry by asking

- 09:11
ANDY FIELD [continued]: it to include all predictors.But, you know, this is what you get by default.So if you want to kind of brainlessly analyze your data,there you go.Another bad decision, there.This all relates to the more complex modelsthat use, like, the boosting thing I was talkingabout with bootstrapping.

- 09:31
ANDY FIELD [continued]: And this is just-- you would onlyuse this if you want to-- because some of the methodsrely on resampling.So it will kind of use a random number generator to do that.So if you want to replicate the results you've got,you would need to make sure that the seed of your random numbergenerator is kind of always the same.Again, it's an option you're probably not likely to use,

- 09:51
ANDY FIELD [continued]: unless you're doing something a bit more complicated.In the model options up here, youcan say put it to values to the data set.You can export the model into an XML file.Again, not necessarily things that you'llwant to do if you just want to run a simple analysis.So that's all there is to it, really.If you kind of leave everything as it is-- apart from,as I said, I've stopped it from automatically

- 10:13
ANDY FIELD [continued]: trying to correct my data for me-- you just click on Run.And then what you should get is a model appearingin your output window.Now you'll see something a bit like this.So you've got this model summary,and you kind of look about and think,well, I'm not really sure what that means.And that's because you have to double-click on it.And when you double-click on it, that

- 10:36
ANDY FIELD [continued]: will open up another window called the Model Viewer, whichis what we can see here.There's a few things here worth-- well, again,it's very visual.I guess if you like visual stuff,it can be quite good, possibly.I mean, probably, if you're working for a record companyand you're trying to give your board of directors information,

- 10:58
ANDY FIELD [continued]: this is exactly the kind of pretty stuff with no substancethat you'd want to show them.So, first of all, you get this thing thattells you what the target was.Tells us that we switched off automatic data preparation.Tells us that we left the default step forwardstepwise method.

- 11:20
ANDY FIELD [continued]: And it shows here a graph and you can see of model accuracy.I'm not sure that's the most appropriate title for whatit's displaying.You can see basically Worse, Better.It's 100% scale, so you can sort of get a very visual ideaof, uh-- well, like I say, I'm notsure it is accuracy, really.But all this is displaying is your adjusted R square.

- 11:42
ANDY FIELD [continued]: So if you actually hover over it,you can see adjusted r square is 0.66.The bar on the chart is showing you basically 66% or 0.66.So it's showing you the population estimateof how much variance the model explains,but in a very simplistic-- you know, worse,better-- kind of way.Personally I just think you'd be better of quoting the number.

- 12:05
ANDY FIELD [continued]: It takes up less space than this graph.And all this graph probably encouragesyou to do is to jump to all sorts of conclusionsabout your model that may or may not be true.But still, it is what it is.If you use this pane down here-- so thennavigate through the output.So if we'd had automatic data preparation on,we'd get an output here, which we're not

- 12:25
ANDY FIELD [continued]: getting at the moment, which tells us what it had done.So what it's done is correct the data-- so,trimming outliers-- stuff like that.Because I switched it off, we get nothing.We then get another graphic which shows us the predictorimportance.And, you know, I suppose in a way it's a good wayto visualize the relative importance of [INAUDIBLE]--

- 12:45
ANDY FIELD [continued]: --advertising budget and number of plays on radiohave similar sort standardized betas,which you might deem as being signs of importance.And attractiveness of the band is much, much smaller.You then get some kind of information about actualand predicted values.So these correlate reasonably well, which is a good sign.

- 13:09
ANDY FIELD [continued]: You can see you get this nice kind of oblong, sausagey typething which means that the predicted values from the modelwere very similar or they correlate reasonably wellwith the actual, observed values.But, again, you can kind of get that informationfrom the R square, really.It's just the visual display of it.We get a histogram of the studentized residuals.

- 13:31
ANDY FIELD [continued]: So this is to check for normality of residuals.Again, this is something you can--if you run the regression manually,as it shows in the book chapter, you'dget something similar to this.But, you know, that looks reassuringlyfairly normal, which is good.We then get some information about outliers.So we basically get a summary of cases with the largest Cook's

- 13:54
ANDY FIELD [continued]: distance.And, again, if you look at the bookCook's distance is a measure of influence on the model.So this is a useful way just to checkwhat the Cook's distances are.As a general rule, Cook's differences close to 1are almost certainly problematic.And these are all relatively small compared to 1,so probably nothing to worry about here.

- 14:14
ANDY FIELD [continued]: But this does at least tell us the cases that have the 10biggest values of Cook's.OK, so we then get this little graphical summary of the model.So what you can see here is we have our target.We've seen this little targetty thing, which was album sales,and we've got three predictors herewhich are kind of going into it.

- 14:36
ANDY FIELD [continued]: Now you can mess around with these thingsdown at the bottom.I'm not sure how helpful they necessarily are.But, for example, you can change whether it displays effectswith certain significance values.So, by default, it will just show all of them-- all effects,regardless of whether they're significant.And if you hover above, you can see here--hover above one of them-- so, for example,the attractiveness-- it will tell you the significance

- 14:59
ANDY FIELD [continued]: and the importance--If you don't like that, you can switch from the Diagram viewto the Table view, which gives youbasically the overall summary of the model.It's basically the allover summary.So this corresponds to output 8.6 in the book.Basically gives you the same sort of values,

- 15:23
ANDY FIELD [continued]: but for the model overall, with all these three predictors in.Now the next-- you can also-- so that's, like, the modeloverall.You can also have a look at this, which is basicallythe same thing, but it's got the-- tells youabout the coefficients.So now it's got the intercept in, as well.

- 15:46
ANDY FIELD [continued]: Tells you here that negative coefficients have orange linesand positive ones have blue lines.So we can see, at a glance, adverts, airplay,and attractiveness all have positive betas, because they'vegot blue lines.Again, if you hover on them, we could see, for example,

- 16:06
ANDY FIELD [continued]: the coefficients for adverts are 0.05.That's significant.And, in terms of its importance, weget this value of 0.473, which allows usto-- that was basically the thing that was displayedon the graph earlier on.Same with airplay.We get the beta coefficient, its significance,

- 16:28
ANDY FIELD [continued]: and its measure of importance.And same for attractiveness.Again, we can switch this to a Table view.The unstandardized beta coefficients,their significance values, and these measures of importance.And if you click on that, we can-- sorry, double-click on

- 16:48
ANDY FIELD [continued]: that-- we can sort of also get the confidence interval,for those, as well.So that just sort of expands the table a bit.And, you know, confidence intervals are pretty important.Finally, you get a model building summary,which just basically tells you that, step1 we stuck in airplay, step 2 we had airplay and then adverts

- 17:11
ANDY FIELD [continued]: went in, and then step 3 we had airplay, adverts,and attractiveness of the band.So there you go.That's automatic linear modeling for you.

### Video Info

**Series Name:** Discovering Statistics Using SPSS

**Episode:** 1

**Publisher:** SAGE Publications Ltd

**Publication Year:** 2013

**Video Type:**Tutorial

**Methods:** Statistical modelling, SPSS

**Keywords:** mathematical computing; mathematical concepts; Software

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Professor Andy Field reluctantly provides a guide to automatic linear modeling using SPSS software. He offers step-by-step pointers with screenshots and explanations to demonstrate the basics of how the program works.