top of page
  • Writer's pictureEan Maloney

Home Run Analytics - Launch Post

I haven't really asked around, but it seems like finding free, publicly available datasets about things you're interested in might be the hardest part about doing data analytics as a sort of hobby. If you want to do a project on census data or weather patterns, you'll probably find the resources you need, but for a lot of things, data is money, and those who have it don't want to share it in a usable format for free.


So for someone like me, the options might seem to be doing a project on a rather banal theme like public transit usage in Chicago (though I'm sure some people like that sort of thing) or getting data using web-scraping methods of questionable legality. (Google "Is web scraping legal?" and read a couple articles if you want to be confused by the history of web-scraping litigation and get some very unclear answers to your question.) I'm taking for granted that it is unfeasible for a solo data analyst to gather large amounts of data by hand for the sake of a personal project.


Luckily, there are lots of people who really like sports and have devoted a lot of time to making large amounts of sports data available for free in analyst-friendly formats. Specifically, the folks at retrosheet.org have compiled and published game reports for almost every Major League Baseball game in the modern era.


So the question is then what to do with all this free data...

The project I came up with was to look at home run trends over the MLB season. It's common knowledge that we're in era of baseball where more home runs are being hit than in the past, but I'm not interested in home-run trends between seasons. Rather, I want to analyze home-run trends within individual seasons.


Here are some questions connected with this exploration:


How does the rate of home-run hitting change over the course of the season? The regular season usually goes from April through early October. It seems prima facie unlikely that the pace of home-run hitting would be constant over this period, because of a number of factors that we might expect to influence it variably over the year. Specifically:


Does it take time for MLB players to get into form at the beginning of the season?

Does fatigue or injury accumulation over the course of the season change the rate of home-run hitting?


Does the weather (especially temperature) have an effect? I take it to be a "common knowledge" assumption that the ball flies farther when it’s hot out.


These factors (along with many others) could seriously affect how hitters or pitchers (or likely both) perform such that hitters hit better (or at least hit it over the wall more) or pitchers pitch better (i.e., give up less home-runs) at different parts of the season. It would be difficult to isolate these factors and determine their relative importance to home-run hitting, but we can see the aggregate effect of all factors by looking at how many home runs are hit during different periods in the season.


Posts will be a sort of project journal that allow readers an insight into my thought process, problem solving, reasoning, and conclusions.




3 views0 comments

Recent Posts

See All

コメント


bottom of page