dataandoutdoors

Dan Shaffer's blog posts about statistics, data science, outdoor recreation, and rural Michigan.

Category: Data

  • Tree-Based Models

    My post discusses the limitations and capabilities of decision trees and more advanced tree-based methods. While much of this may seem critical, decision trees can often be useful and surprisingly effective at approximating complex non-linearities and discarding irrelevant predictors with limited analyst input all while being fairly robust to missing and non-sensical values. I enjoy… Read more

  • The Skills Shortage in Data Related Occupations

    You won’t be around any sort of professional circle, social media, or talent network long without hearing about the extreme talent shortfalls for data analytics, data science, data engineering, and data analysis along with related skill sets in math, statistics, computer science, and quantitative business and social science. This leads one to wonder to what… Read more

  • Regularized Regression: Good for Goldilocks or the Worst of Both Worlds?

    Is regularized regression a happy medium or does it simply combine the disadvantages of traditional regression and machine learning approaches? The answer is that there is a bit of truth to both outlooks. (Note: Moving forward I refer to regression as more traditional linear/semi-linear prediction models for either continuous or classification tasks. This is as… Read more

  • How I developed my idea for the Northern Michigan search interest project.

    How we come up with a data science, analytics, or data based research project is the subject of both common lore and debate, or at least debate as defined by dissenting behavior. Common lore dictates that we start with the business question that we are most concerned with irregardless of available data and methodologies and… Read more