The four data science skills I didn‘t learn in grad school (and how to learn them!)

When I was getting my graduate degree in statistics and machine learning, I thought I was learning everything I needed to be a successful data scientist. I took courses in advanced statistical methods, learned to code machine learning algorithms from scratch, and spent countless hours in the lab running experiments and analyzing data.

And while my graduate training gave me an excellent foundation, when I transitioned into my first industry data science role, I quickly realized there were some critical skills I was missing. Skills that are essential for success as a professional data scientist, but that aren‘t typically taught in grad school.

In this post, I‘ll share the four key data science skills I didn‘t learn in grad school, along with practical tips for how you can develop them yourself. Whether you‘re a recent grad about to start your data science career, or a current student looking to set yourself up for success, read on to learn the skills you‘ll need to thrive in the "real world" of data science.

1. Deploying Models Into Production

In grad school, the vast majority of the machine learning projects I worked on were purely academic. I‘d spend weeks or months perfecting a predictive model, tuning the hyperparameters to eke out every last bit of accuracy. Then I‘d write up my results in a paper, submit it to a conference, and move on to the next research problem.

But in industry, the model itself is just the beginning. For a machine learning model to actually deliver business value, it needs to be deployed into a production environment, where it can serve predictions in real-time as new data comes in. And deploying models at scale turns out to be a huge challenge.

As data scientists, we need to worry about issues like:

  • Containerizing our model and its dependencies so it can be reliably deployed
  • Setting up pipelines to retrain the model on fresh data
  • Monitoring the model‘s performance over time and setting up alerts if it degrades
  • Integrating the model with other production systems and APIs

In short, we need to develop some serious engineering chops to get our models out of the prototype stage and into the hands of users.

So how can you learn these critical deployment skills? Here are a few suggestions:

  • Seek out end-to-end projects that involve not just building models, but deploying them as well. Don‘t just stop once you have a working prototype – make deploying it into production a requirement.

  • Familiarize yourself with common deployment technologies and platforms like Docker, Kubernetes, AWS, GCP, etc. There are plenty of tutorials and courses available.

  • Learn software engineering best practices around things like versioning, testing, logging, etc. Consider the production-readiness of your code, not just the statistical performance.

  • Collaborate with software engineers and DevOps folks whenever possible, and learn from how they approach deployment challenges.

The more you practice deploying real models into production, the more comfortable and proficient you‘ll become. Don‘t neglect this critical skill.

2. Collaborating Cross-Functionally

In my grad school days, most of my time was spent working solo or collaborating with other students and researchers with similar technical backgrounds to my own. But as an industry data scientist, I‘ve had to learn to work effectively with colleagues from all across the business – many of whom don‘t know the first thing about machine learning.

To drive real impact, data scientists need to partner with software engineers, product managers, designers, salespeople, executives and all manner of non-technical domain experts. We need to be able to communicate complex technical concepts in plain English, understand business goals and context, and adapt our work to the needs of our stakeholders.

A machine learning model may be an impressive technical achievement, but if it doesn‘t solve a real problem for the business, it‘s not going to see the light of day. I‘ve seen many a data science project falter due to misalignment with product priorities, failure to get buy-in from leadership, or lack of collaboration with the end users meant to benefit from it.

Developing these cross-functional collaboration skills wasn‘t something that came naturally to me, but here are some of the ways I‘ve worked on them over the years:

  • Proactively seek out data science projects with cross-functional components. The more you work with non-technical colleagues, the better you‘ll understand their needs and communication styles.

  • Practice explaining technical concepts in simple terms that a layperson could understand. Write blog posts, give presentations, or just informally discuss your work with people from other backgrounds.

  • Take courses or read up on business skills like product management, design thinking, marketing, etc. Empathy and understanding for your colleagues‘ domains is invaluable.

  • Shadow other teams and sit in on their meetings to absorb context. Invite them to your meetings as well so there‘s mutual understanding.

  • Solicit feedback from cross-functional stakeholders early and often. Make them feel bought into the work.

It can be intimidating to step outside the familiarity of the tech world, but building bridges to other parts of the business is essential to becoming an effective data scientist. Embrace the challenge!

3. Focusing on Business Impact

In academia, the main currency is statistical significance and incremental improvements to the state of the art. Got your new algorithm to perform 0.5% better than the previous benchmark? Congrats, that‘s publishable material!

But in industry, impact is measured very differently. Businesses care about metrics like revenue, user growth, profitability, etc. A model that improves a key metric by 10% can be a huge success, even if it‘s using a relatively simple algorithm under the hood.

As data scientists, we need to retrain ourselves to focus on moving the needle on business goals, not just optimizing statistical performance. A good data scientist doesn‘t just care about model accuracy – they care about model accuracy within the constraints of engineering feasibility, budget, ethics, and many other real-world factors.

Some ways to hone your business acumen:

  • Frame every data science project in terms of its concrete value to the business. How will this model improve key metrics or KPIs? Make this part of your project planning from the start.

  • Always define clear success criteria upfront, ideally in terms of business impact. Then rigorously measure your work against those criteria.

  • Get in the habit of doing cost-benefit analysis on your projects. Is the juice worth the squeeze? Is there a simpler, lower-cost approach that would get you 80% of the benefit?

  • Learn to separate statistical significance from practical significance. Just because a result is statistically significant doesn‘t mean it matters to the business.

  • Prioritize interpretability over raw predictive power in your models. A simple, explainable model that the business trusts can be better than a black-box model with superior accuracy.

The best data scientists operate like mini-CEOs. They combine technical chops with business savvy, and they always keep the big picture in mind.

4. Rapid Prototyping & Iterating

The final key skill I‘ve had to develop as a data scientist is a willingness to move quickly, fail fast, and iterate constantly. The breakneck pace of industry is a far cry from the measured, deliberate work of academia.

In grad school, I‘d often spend months on a single research project, running experiment after experiment until I had incontrovertible evidence to support my hypothesis. I was shooting for perfection, for total certainty.

But in the business world, perfectionism is the enemy of progress. Most of the time, a decent model deployed quickly is more valuable than a stellar model deployed after months of fine-tuning. As data scientists, we need to cultivate a rapid prototyping mindset – focusing on getting a minimum viable product out the door so we can start gathering feedback and iterating.

This definitely took some getting used to for me. I had to learn to satisfice instead of optimize, to treat my analyses as living documents rather than fixed final reports. Over time, I‘ve come to embrace the rapid experimentation ethos, and I‘ve found it immensely gratifying to see my work start having real-world impact so much faster.

Some tips for speeding up your data science process:

  • Timebox your projects and set aggressive deadlines for yourself. Create forcing functions to prevent perfectionist tendencies from taking over.

  • Get comfortable showing incomplete work to colleagues and stakeholders. Their feedback is fuel for your next iteration.

  • Adopt an agile, sprint-based workflow. Work in short bursts, demo results frequently, and adapt your plan as you go.

  • Don‘t be afraid to start simple. Can you prototype something with basic heuristics before trying machine learning? Can you start with a subset of the data? Boil the problem down to its essence.

  • Measure the ROI on your time. Where are the diminishing returns in your process? Cut scope ruthlessly.

Like any skill, rapid prototyping takes practice to master. But learning to move fast and break things is essential for data scientists who want to keep pace with the business world.

So there you have it – the four key skills I wish I‘d learned in grad school, and my advice for developing them yourself. To recap:

  1. Learn to deploy your models, not just prototype them
  2. Collaborate cross-functionally to understand the business context
  3. Focus on concrete business impact, not just statistical purity
  4. Prototype rapidly and iterate relentlessly

Of course, this isn‘t an exhaustive list – every data scientist‘s journey is different. But in my experience, these are the skills that separate the good data scientists from the great ones. The ones who don‘t just produce brilliant analyses, but who drive real value for their organizations.

If you‘re an aspiring data scientist, I encourage you to seek out opportunities to practice these skills, even while you‘re still in school. Doing so will give you a huge leg up when you land your first industry role.

And if you‘re already working as a data scientist, I challenge you to honestly assess your strengths and weaknesses across these four domains. Where could you be focusing your professional development?

Becoming a well-rounded data scientist is a lifelong journey. But equipped with these skills, you‘ll be well on your way to doing work that makes a real difference in the world. So get out there and start learning!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *