Over at Matlab Geeks Headquarters, we’ve been busy analyzing the AP and Coaches’ Polls to see how the data look, and interestingly enough, some of my preconceived notions about the Oregon Ducks were confirmed. The final results showed that the top 5 under-rated programs in the country were: Boise State, Utah, TCU, Iowa and Oregon. The top 5 over-rated schools? Florida State, Tennessee, Michigan, Florida and Clemson. Don’t just take my word for it, let’s see how we got these results…

While we had previously posted a spreadsheet with just the rankings, I decided to gather the total votes among all AP voters from the 2000-2009 college football seasons. The spreadsheet with this data can be found here and is courtesy of cnnsi.com and espn.com. Once again, load the data as we did in the first tutorial of this series and follow along to see how we performed the analysis using Matlab. My variable names are the same in Matlab as in the Excel spreadsheet.

First we want to generate a preseason table that lists each teams vote count by year. This can be accomplished by running the following code:

AP_pre_table=zeros(AP_numteams,10); for i = 1:AP_numteams temp_year = year(strcmp(AP_pre,AP_teams(i))); temp_year = temp_year -1999; temp_vote = AP_pre_vote(strcmp(AP_pre,AP_teams(i))); for j =1:length(temp_year) AP_pre_table(i,temp_year(j)) = temp_vote(j); end end

The *zeros* command initializes the matrix to be all zeros. This is done for two reasons. First, it is computationally faster for Matlab to have the matrix size pre-defined. Second, it assigns each team with a value of 0 if they received no votes. The rest of the code incrementally goes through each team alphabetically, and finds the vote count for each year between 2000-2009. The voting values are stored in the appropriate year column: 1-10 (for 2000-2009), and thus the subtraction by 1999. A similar procedure can be done for the postseason AP voting results.

Now how do you calculate over/under-rating?

AP_diff = AP_post_table-AP_pre_table; AP_diffsum = sum(AP_diff,2); [AP_sorted index_sorted]=sort(AP_diffsum); overratedAP = AP_teams(index_sorted);

The *AP_post_table* and *AP_pre_table* should be of the same size, and simply subtracting the preseason votes from the postseason votes will give us the year by year discrepancies in votes. Adding each team’s overrating [(-) score] and underrating [(+) score] will provide us with information on how teams are perceived during the preseason versus how they are ranked at the end of the season. In this case we use the * sum* function, with *2* as the 2nd input. This tells Matlab to sum across each row instead of summing across each column. Finally, we perform a sort to see in which order teams rank. Again, the whole voting process is subjective, but it gives us a glimpse into how the voters perceive teams.

The results are shown in the figure above. To generate this plot we ran the following commands:

plot(AP_sorted,'.') set(gca, 'XTickLabelMode', 'Manual') set(gca, 'XTick', []) ylabel('Over-rated Under-rated') for i=1:97 if mod(i,2)==0 text(1,AP_sorted(i),overratedAP(i),... 'rotation',90,'position',[i,AP_sorted(i)+100]) else text(1,AP_sorted(i),overratedAP(i),... 'rotation',90,'position',[i,AP_sorted(i)-100],... 'HorizontalAlignment','Right') end end

The *set* commands allow us to remove the x-axis labels and ticks. (Many other properties for plotting can be set here as well. See our tutorial on plotting for more on this.) We then utilized the *text* function, along with the *rotation* and *HorizontalPosition* properties, as well as the *mod* function to alternately label the teams on the graph (We also rotated the entire figure 90 degrees clockwise for ease of reading). From these results we see just how far off the extreme teams can be. In fact Boise State and Florida State saw swings of almost 5000 points from preseason to postseason polls over the last 10 years. To be fair, a drop within the ranks of 1-10 will induce greater changes than between 20-30, but there is still a lot of consistency in the way “knowledgeable” voters vote. To investigate the overall relationship of preseason to postseason scores, we can run a correlation:

[R p] = corrcoef(AP_pre_table,AP_post_table)

The results indicate that there is a significant, fairly large correlation in the voting, with R=.628 (P < .001). This seems to indicate that by and large the teams that are favored in the preseason remain in favor among the voters throughout the year. So it’s possible that teams such as Utah in 2008, who begin the season in 29th place with only 53 votes, just have too much ground to make up among all the voters to ever have a chance at the championship.

In fact, looking at all the data, the highest ranking achieved for a team receiving zero votes in the preseason?

high_vote = max(AP_post_table(AP_pre_table == 0)); [high_team high_year]=find(AP_post_table==high_vote); AP_teams(high_team) high_year+1999

Iowa in 2002. The *find* function is used here to give us two outputs, which provides the row or team, as well as the column or year of the maximum entry. The 2002 Iowa team eventually finished the season ranked 8th with 1334 votes, which is actually quite the accomplishment considering their preseason perception. This rise also factors in significantly into their 2000-2009 underrating. Among the other underrated teams, everyone probably knows about Boise State, TCU and Utah’s consistent rise in the rankings, yet each year these teams begin among the middle or bottom of the pack.

Next week we’ll do a similar analysis for conferences and try to settle the Pac-10/SEC feud, but for now we leave you with this xkcd comic that nicely summarizes our findings ðŸ™‚

Excellent way of describing, and good post to take data regarding my presentation topic,

which i am going to present in college.

Hello my loved one! I want to say that this post is amazing, nice written and come with approximately all

significant infos. I’d like to see extra posts like this .

Thanks for a marvelous posting! I really enjoyed reading

it, you happen to be a great author. I will make certain to bookmark your

blog and will eventually come back sometime soon. I want to

encourage you to continue your great job, have a nice afternoon!

Attractive section of content. I just stumbled upon your weblog and in accession capital to assert that I get actually enjoyed account your blog posts.

Anyway I’ll be subscribing to your augment and even I achievement you access consistently rapidly.

Good stuff. I think this is a good way of determining the degree to which a team is overrated or underrated by the media voters.

But since a large portion of the perception of the quality of each team is contributed by your average viewer (the type of person who consistently uses arguments like “All you need is a TV to see that such and such team/conference is better than some other one”), I would think another interesting way of determining overrated/underratedness would be from the perspective of the average viewer. The best way to quantify this perspective, of course, is by using point spreads.

So it would be interesting to go back the last ten years and figure out how each team performed on average against the spread. For example, if Louisville is playing Southern Miss, and the spread is Louisville -3.5, and the final score is Louisville over SoMiss by 31-28, then Louisville would get a score of -0.5 and SoMiss would get a score of +0.5. Then you’d figure this difference for all games played by that team in the last ten years (that had a spread) and divide by the total number of games.

Since spreads change from opening line to game time, I would think the most uniform way to determine performance against the spread would be to use the game time spread.

I would do this myself, but I don’t have the ability to program it or automate the parsing of the data. Also, I don’t know if there’s a record of gametime spreads anywhere on the intertubes, or how to even find it.

So mull it over, and if it’s a project you want to take on, be sure to let Ted Miller (the ESPN Pac-10 blogger) know what’s up.

thank you! This is very interesting. covers years of Mike Bellotti’s saying “we don’t get any respect” Now it’s Chip Kelly’s ” Win the Day!” Maybe Oregon will be in the drivers seat now that USC is just another team. Go Ducks!

thanks, gives me some data to share with my blowhard sec friends!

Pingback: College football 2000-2009: Final thoughts | Matlab Geeks

thanks