Disruption Insights: Data Scientists Should Be Storytellers

Updated Jan 18, 2024 • 11 min read

Meet Chad Sanderson, Head of Product, Data Platform, at Convoy. Chad is active in so many fields it’s hard to define one area of expertise he specializes in.

His domain knowledge spans from data product management, big data, machine learning, pipeline development, ML and data ops, and data quality, just to name a few.

Before joining Convoy, he worked for Oracle, Microsoft, Sephora, and Subway. He’s a big advocate of data contracts, a concept that makes the cooperation between data producers and consumers smoother by bringing awareness and accountability to both parties.

In the Disruption Insights, we discover tips and nuggets of wisdom from the field’s professionals, discussing various phases of Data Science projects. Our aim is to turn sometimes complex aspects of this domain into actionable insights that can be understood and followed even by non-technical audiences. Today you can learn from the experience of Chad Sanderson.

🗂️ Getting started with data projects

Convincing stakeholders to back a data science project

The core of convincing any decision-maker to back a data project is your ability to connect the outcomes of that project to the target outcomes of the business. If the goals of the business for the year are to grow profitably, then any data efforts that facilitate growing profitably are going to be prioritized over the ones that are not.

It's important to familiarize yourself with what the business and certain organizations within the business care about.

The next thing is to create a narrative. Data is a bit different from building a third-party-facing application, so it's more challenging for us to directly attribute changes in business, like margin, profit, and customer growth to our efforts. That’s why we need to be able to tell a story about why and how the things that we work on will contribute to that.

Aspects to consider before the development of a data science solution

The first thing to think about is: who is the customer that you try to serve? And what is the problem that you attempt to solve? It's also important to think about how meaningful that problem is to the company, both in the short and long term. If you're solving a problem that, at most, could only affect ten or 20 customers out of 10,000, then it may not be the highest return on investment.

Similarly, you want to think about how big and impactful this problem is. If you're solving a problem for all 100 customers, but it's a very minor problem, it's not something that they think about or will meaningfully affect their experience using your applications, then it's another good example of potentially not making the requisite ROI to be meaningful.

The next thing that is really important to think through is the user story. The user story is essentially the state of the world that is going to change for that customer.

We need to know what the actual goal of the project is. What is the experience that we think is going to make that particular customer's life better? And then, the next thing we need to think about is what is the data that we need in order to actually build a successful project?

What do we need to know about our customers, about their experience, about their journey? Based on that data, we can do additional investigation to validate our initial hypotheses around the problem, which brings us to another thing to add – a hypothesis.

It should answer the question: if we solve this problem and deliver this user story, what do we think is going to change about the world? There needs to be some data that supports that decision. Based on that data, we might put together a hypothesis.

The role of proper data analysis

At Convoy, we definitely focus quite heavily on validating our hypotheses through data.

Building models is very exciting and we can come up with conclusions that seem logical, but our customers actually behave in potentially unexpected ways. I've seen a lot of time get wasted by spending months building relatively complex AI systems only to figure out that these were optimizing for a customer behavior that isn't particularly meaningful or doesn’t have a massive impact on the bottom line of business.

The other thing that I've seen is not enough analysis done of MVPs. It's quite common to do a significant amount of experimentation around the model in the feature development phase, but very little analysis after the rollout of that model. This can lead to highly over-optimized machine learning systems that don't accomplish a whole lot.

📈 Measuring data science impact

Measuring and proving the business value delivered by a data science project

When measuring business value of projects, we focus on the efficiency and the happiness of the team.

The way we measure that is basically how much time do people spend on data analysis, on exploratory analysis? Where are they spending that time? How easy is the process?

It's a lot of qualitative metrics. NPS is a good example of a metric that we use when we ask our team: would you recommend our data infrastructure to someone else? Would you recommend someone to come and work at Convoy and use our infrastructure? How does that infrastructure compare to previous companies that you worked at?

The lower that metric is, the more that we believe we negatively impact our team, we make their life harder. What’s more, by persisting in that state we’ll potentially result in churn – people will quit the company because they can’t accomplish their job function effectively.

Data contracts in engineering projects

I'm a big fan of data contracts when it comes to engineering projects. Data contracts are these agreements between data producers and data consumers on what data needs to be provided, in what shape, according to what schema, and on what SLAs.

We use them anytime there is a data set which we believe is delivering revenue, but it is potentially impacted by quality. We can also use contracts as the delineation point of when to essentially start measuring the impact of high-quality data.

Interestingly, we usually see a pretty immediate shift in the quality of the downstream data set after an implementation of a data contract.

The contract is about bringing awareness to the producer of how their data is being used, who's using it, how much value it brings to the business, what sort of enforcement needs to be in place, and what could happen if that data set fails.

It also provides a surface for a rapid iteration and conversation. If the data consumer needs to change the contract, then they can very easily have that conversation with a producer vs. if there's no contract.

🧭 Success and failure

Data projects that are not finished successfully

Around 30% of our projects are not finished successfully. However, most projects die before we get into a prototype phase. Many of them are canceled in a hypothesis, development, or data collection phase. Then probably another 30% of the projects die in the prototyping MVP phase.

It’s caused by the fact that most of my team’s work happens in the phases of hypothesis building, asking questions, and looking at the data. We simply decide we don't have enough evidence to have conviction about this project, or there are more important things that we could be working on.

Reasons behind data project failures

Reasons that lead to project cancelation or a failure are quite varied. Sometimes the failures occur because we didn't do a good enough job during the data collection phase. This means we didn't spend enough time really fleshing out our hypothesis or getting a well-rounded enough view on the customer.

Another reason a project fails is we oftentimes don't understand the engineering systems well enough. That leads to us making assumptions about how the product works, how the features work. Then, when we realize that they work in a different way than we originally anticipated, it’s too late and it doesn’t make sense to carry on with the project.

Sometimes it just turns out that we discover other larger priorities and after POC (proof of concept) is done, the potential doesn't stack up to what else we could be working on. Sometimes operational issues appear. We have a massive backlog and we think that delivering that backlog adds more value than our current project, and we won’t continue again unless the impact of the POC is actually quite outsized.

We're pretty ruthless in terms of how we cut projects and we don’t have any problem with that. You shouldn’t punish people if projects fail. If you do or you hurt their careers because they built things that don't work, then people are not going to be willing to take risks. Yet, in the end, the big risks have the biggest reward. That’s why we try to cycle through as many experiments as we can.

💪 Data science application

Data science use cases in experiments

We use data science to run experiments and we do about 50 experiments per month. These include A/B tests in a frontend application in the actual app or experiments by deploying machine learning models. We do a large number of dashboard developments, we have real time and offline models we use for wide degree issues for finance, reporting, and planning, the number of operations employees we need to hire or brokers we need to hire, or things like that.

Organizing a data science team

The way our data science teams are organized blends both dedicated data teams and product teams that include data scientists.

We have data scientists embedded in every product team, and we also have data engineers that are centralized that focus on supporting the infrastructure and the core pipelines.

Then, we also embedded other sorts of data developers like business intelligence engineers or data analysts alongside data scientists to basically act as support. For example, we have a centralized analyst team that supports marketing, sales, and other business operations.

Want to be a part of the Disruption Insights series? Shoot me an email at: paulina.burzawa@netguru.com

Learn from other experts:

"Pay Attention to Data Maturity" with Alexandre Lenoir from Spendesk

"Promote a Quality Mindset Among Engineers" with Jenny Warnke from Delivery Hero

"Fall in Love With the Problem First!" with Jan Schütz from finstreet