Jump to content

Orioles Hackathon


DrungoHazewood

Recommended Posts

  • Replies 64
  • Created
  • Last Reply
What I Learned and What is Next

Kevin and I are software engineers, we build things. Our point of view is system design and architecture.

Phil comes from stat and data science. His mind set is data exploration, then build a model, then analyze.

Putting these two skill sets together allows us to take on these next high level steps:

  1. The analysis builds the model, defines the tests, sets up a pattern and comparison. What we did the Chris Tillman's Fastball.
  2. Then we design a system to visualize the analysis so its useful to a large swath of the population.
  3. Then we apply this pattern to a larger set of players so we can compare players to one another and to the data set at large.

Goal:

  • A web app showing a pitchers change in pitch movement correlated to pitch count vs the league average change in pitch movement.
  • Be able to display multiple pitchers at the same time for comparison.

We would like to finish our project. Wives, kids, and jobs will make this a long term project.

This is a really fascinating project, and I'm sorry to say that the announcement for this event passed totally over my head. I never heard of it until nearly a week after the event itself is over. I tend not to follow the O's very closely during the off-season aside from interest in their trades and signings.

Next year I will definitely be ready for something like this to pop up in the winter and will try to hop on a team! I've been a software engineer for most of my life, but the extent of my data analytics comes from a few college classes using GNU Octave (MatLab clone) and crunching team performance metrics for the boss.

Regarding the CSV, not sure if you guys tried this, but you could have imported your data from CSV to Excel or Access (assuming it was well-formed CSV with well-defined escape characters and delimiters) and from there sucked it into the database of your choice (or just leave it in Access and grab it using SQL (ODBC); modern Access can handle ~1GB of data just fine.)

-------------

Regarding this point in particular:

A web app showing a pitchers change in pitch movement correlated to pitch count vs the league average change in pitch movement.

A good article to read and consider: the "flaw" of averages can be a problem sometimes. It may be that simply taking the mean of the data is not very meaningful if there are vanishingly few players whose actual data corresponds well to the average. But I'm sure your data scientist guy knew that and you're just oversimplifying your goals for the non-technical readers of this forum ;)

Anyway, sounds like you guys had a lot of fun doing something really cool! I hope they do another one next year; I'll try my luck at participating then (but of course I might get unlucky and squeezed out like two of the other OH teams that posted here...)

Link to comment
Share on other sites

So who won in the end? When you were watching the presentations at the end, did you think the winner clearly had the best work or was it a close call?

Actually I do not remember who won. I think the team name is Educators.

My team name by the way is Terps2004.

The Milt Papas team (the guy who wrote the blog weams referenced) came in 2nd. They definitely had the best UI. Which is what the judges were after I think.

If the Orioles do post the results publicly (they have a github, but nothing up there yet), I'll post the link here.

Link to comment
Share on other sites

Actually I do not remember who won. I think the team name is Educators.

My team name by the way is Terps2004.

The Milt Papas team (the guy who wrote the blog weams referenced) came in 2nd. They definitely had the best UI. Which is what the judges were after I think.

If the Orioles do post the results publicly (they have a github, but nothing up there yet), I'll post the link here.

Thanks for everything. I hope you got lots of rep on this.

Link to comment
Share on other sites

This is a really fascinating project, and I'm sorry to say that the announcement for this event passed totally over my head. I never heard of it until nearly a week after the event itself is over. I tend not to follow the O's very closely during the off-season aside from interest in their trades and signings.

Next year I will definitely be ready for something like this to pop up in the winter and will try to hop on a team! I've been a software engineer for most of my life, but the extent of my data analytics comes from a few college classes using GNU Octave (MatLab clone) and crunching team performance metrics for the boss.

Regarding the CSV, not sure if you guys tried this, but you could have imported your data from CSV to Excel or Access (assuming it was well-formed CSV with well-defined escape characters and delimiters) and from there sucked it into the database of your choice (or just leave it in Access and grab it using SQL (ODBC); modern Access can handle ~1GB of data just fine.)

-------------

Regarding this point in particular:

A good article to read and consider: the "flaw" of averages can be a problem sometimes. It may be that simply taking the mean of the data is not very meaningful if there are vanishingly few players whose actual data corresponds well to the average. But I'm sure your data scientist guy knew that and you're just oversimplifying your goals for the non-technical readers of this forum ;)

Anyway, sounds like you guys had a lot of fun doing something really cool! I hope they do another one next year; I'll try my luck at participating then (but of course I might get unlucky and squeezed out like two of the other OH teams that posted here...)

There are so many things we could have done but without knowing it was csv flat files before hand, its hard to be ready to go. And none of us had Access installed on our dev machines and only one machine had Excel it it didn't have enough memory to deal with 600+ mb of data at one time in Excel. Honestly, I was happy to just code it on the fly, thats what I do all day and am most comfortable with.

We did have postgres installed (because we are engineers ;)), that was our plan, but defining an import schema from csv was just too time consuming.

R Studio has an import from CSV feature. But R is terrible (brain exploding bad) at data manipulation. So we used Python to filter and perform calculations and dump cleaner csv files. Then imported into R Studio todo the stats stuff and produce charts. This was a good approach for building a model around 1 player, which is what we did and this is were a data science expert shines.

Its a terrible approach to building a system that applies our model to many players and expose a app for users. This is were software engineers shine.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...