Featured: Leo Pekelis, Head of Data at CloudTrucks

leo pekelis profile image

Recently, we got to chat with Leo Pekelis, the head of data at CloudTrucks and technical adviser to Eppo, powering A/B experimentation at scale. Leo shares about about leading his data science team, challenges, and his thoughts on how data science will continue to evolve.

CloudTrucks is a startup focused on optimizing the freight industry for trucking fleets. Previously, he helped develop Optimizely's Stats Engine, and the theory behind it, and worked on pricing at Opendoor. He completed a PhD in statistics from Stanford University.

At a high level, what are you working on?

Personally, I am working on building out our data organization. As a young team, starting 2 years ago, we focused on two things: prototyping data solutions to deliver value to our customers, and making “crawl before we can walk” investments like standing-up our analytics tech stack. Now, we are applying our learnings from the past 2 years and investing in infrastructure, and tooling to support our most promising prototypes.

The data science team in particular is focused on 3 key initiatives, all of which support CloudTrucks’ mission to build technology for freight trucking fleets on our platform. First, we mine freight market conditions. Second, we recommend deliveries to fleets in our ecosystem. Third, we identify the safest and most performant truck drivers.

What is the hardest part of your job?

The hardest part of my job is also the most fun, building valuable and scalable solutions where there was nothing before. I think this can be tricky for a data team at an early stage startup because data science applications can have longer research cycles, and require investment in infrastructure like data pipelines, and workflows. This creates friction with a startup’s hyper focus on execution and lean resources.

The fun part is coming up with creative ways to thread this needle. One example is our in-house customer metrics service. With it, we’ve enabled our data scientists to develop metrics in our data warehouse, and release them to our end customers with no backend code changes.

What areas in data science do you think are under-served?

I’m very enthusiastic for the emerging data science toolkit and framework landscape, but I see fewer that encourage good software engineering principles. In my experience, I’ll either see a framework that is very close to the metal, which can make it difficult for a less experienced engineer to operate, or a simplified frontend that prescribes narrow workflows, limiting functionality.

Where’s there too much hype?

While valuable and interesting, I think AutoML solutions are more niche than commonly believed. At the end of the day, the data going into and out of a ML model is going to make or break your solution. This requires designing what datasets the model will be using, how its predictions are going to be used downstream, and building the required data pipelines and transformation layers. First, all this is a prerequisite, and second, once completed, a simple predictive model is often preferred for easier QA and learning extraction.

What is a “controversial” observation you have?

I think data scientists are going to become both more and less technical as the profession continues to evolve. We will be more technical from adopting software engineering best practices like code review, version control, comprehensive testing, abstraction and modularity. We’ll also become more technical as we own our solutions end-to-end and are responsible for SLAs. However, we will be less technical in that the ‘code’ we write will closer resemble this blog post than a Java class.

More technical concepts to improve a data scientist’s utility; less technical jargon to lower the barrier to entry.

Connect with Leo Pekelis here. If you’re a data science leader and would like to be featured, contact us here.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.