Unfinished Business

Reinventing ‘modern’ data tools at MotherDuck and Omni

Unfinished Business cover image

Editorial note: This blog is based on a recent chat between the Omni and MotherDuck co-founders on the past, present, and future of the data stack. 

What we call the modern data stack has existed for a decade now. While the tools in that stack have improved incrementally over the years, we haven’t seen a seismic shift in the data world since the mid-2010s, when database technology advanced significantly and made it possible for a set of (then) disruptive cloud data applications to replace the data stack of the 2000s and become the new status quo. 

Back then, Omni co-founder Colin Zima and MotherDuck co-founder Jordan Tigani were working on two of those applications — Colin on Looker and Jordan on Google Cloud BigQuery. Over the course of their tenures, the teams building Looker and BigQuery each grew from small organizations focused on innovation to large ones focused on scale, and Colin and Jordan got to see not only what made the products so successful, but also the customer problems they never solved. 

In the past two years, Colin and Jordan both decided to start data companies to help fill the gaps left by the past generation of data infrastructure. Colin and other Omni co-founders, Jamie and Chris, made the decision shortly after Google acquired Looker. “We built Omni because we believed that people needed a product that could provide centralized and decentralized business intelligence at the same time,” says Colin. 

Jordan’s decision to become an entrepreneur came after more than a decade at BigQuery followed by two years at SingleStore. “We had some of the biggest companies and data warehouses in the world using BigQuery, but the stuff people actually queried tended to be much, much smaller,” recalls Jordan. “I thought: ‘If everyone is chasing the few customers that are using really large data, there have to be opportunities to build something for everyone else.’”

Recently, Jordan and Colin sat down for a candid, in-depth discussion about the data landscape of the 2010s, what they think the last generation of data tools got right and wrong, and how the lessons they learned at BigQuery and Looker have ultimately informed the tools that the MotherDuck and Omni teams are building today.

Read below to hear Colin and Jordan’s thoughts on:

*FYI, we’ve condensed and edited Colin and Jordan’s responses to make this blog short and easy for you to read. For the full, unscripted interview, watch the video.

Origins of Omni and MotherDuck #

Colin: I used to work at HotelTonight, which was one of Looker’s first customers, and I eventually asked if I could join Looker. I stayed there for eight years doing a variety of things, including product and customer support. After Looker was acquired by Google, I realized there was still a lot of interesting work to do in BI. Now I get to work with a team to try and take the best parts of every BI tool and bring them into one.

Jordan: I started out working in operating systems, then moved to compilers, then storage, and eventually databases. I joined Google because I was interested in the concept of big data and they seemed to be doing it better than anyone else. I worked on a lot of the core engineering for BigQuery and led the product team for a while. After a decade, it felt like too long to be working on the same thing, and I wanted to innovate —- not find ways to squeeze customers for more money. So I went to SingleStore, a later-stage database startup, and helped them move to the cloud. But I knew that if I wanted to be really passionate about my work, I needed to start my own database company and see how far I could take it. 

How the data landscape changed in the 2010s #

Jordan: The early 2010s was the Hadoop era. There was a lot of excitement around big data, but the technologies available at the time forced you to change your mental model and operations to function at scale. The second generation of tools (like BigQuery and Snowflake) allowed you to operate in the old way but with larger data sizes. That was a major advantage. The other change was that the cloud migration was starting in earnest. People began storing their data in the cloud, and if your data is in the cloud, then you need to process it in the cloud, and so on. 

Colin: I think the innovation in BI followed the same trends. Back then, I was on the customer side using a MySQL replica that wasn’t terribly performant for the types of analytics we wanted to do. We started using Amazon Redshift and the performance profile was completely different. 

“Now that you could do everything in a cloud warehouse, the idea of doing BI by extracting data into a tool didn’t make sense anymore.” Colin Zima, Co-Founder and CEO of Omni

I actually came to Looker convincing them that we also had to do the BI stuff — a reconciliation between all the fun, interesting things that we wanted to do and what the market wanted. From a go-to-market and customer experience perspective, we delivered really well.

Opportunities created by early cloud adoption #

Jordan: The established products in the data stack didn’t work once everything moved to the cloud. A lot of the constraints that had driven the way that some technology was built up until that point were removed in the cloud or looked different once storage and compute were separated, for example. This created an open field of opportunities to build the things that were needed rather than clones of what had been built in previous decades. 

Colin: I know it sounds trite now, but 15 years ago, not everything was SaaS. Software wasn’t consumerized to the extent it is now, where you could go to the “store” on the internet and buy a data movement tool, a database, and a BI tool that worked really nicely together. The move to the cloud enabled startups to stand up a very mature stack on their own very quickly and put reporting directly into people’s hands.

Reasons for starting over #

Jordan: There were two events that didn’t seem big at the time but made me think that we were chasing the wrong thing at BigQuery. One was the conversation about benchmarking and whether certain companies were cheating. But the thing no one discussed was that the data they were using for these benchmarks was 30 terabytes and it was supposed to represent performance for actual users. In my experience, there was really no one using data that large.

“I did a bunch of analysis of data sizes, and something like 90% of queries were 100 megabytes — tiny. So the two juggernauts of the industry were focusing on getting bigger and bigger data, rather than what users cared about.” Jordan Tigani, Founder and CEO of MotherDuck

The second event was something that happened at SingleStore. We had some customers who were saying they wanted to run analytics locally before they pushed things out to the cloud. But SingleStore wasn’t designed to shrink down like that. There were all these microservices and aggregators and other pieces that just couldn’t shrink down. 

Around the same time, I learned of this upstart database called DuckDB. It was really robust, and I thought, I wonder if someone should try to build a serverless version of this, and then I thought, maybe that person should be me. I tried hacking on it for a while and that hacking eventually turned into MotherDuck.

Colin: Success creates an innovator’s dilemma. As you’re building, you start having something your product does well. For Looker, that was modeled, centralized, governed BI. We talked about bridging the world between so-called data ‘breadlines’ and data chaos, but we didn’t really achieve that balance. If you have Looker, you still need your data team to do things for you. 

"The core thesis of the governed data model and centralized BI persisted throughout the Looker experience, and we couldn’t just go and undo that because it’s what customers came to us for. But I always wanted a product that could compromise between centralized and decentralized BI, and it felt like a big gap in the market. That was the thesis for Omni from day one, and it’s exactly what we built." Colin Zima, Co-Founder and CEO of Omni

Why these solutions aren’t being built by incumbents #

Jordan: I think it’s the innovator’s dilemma here too. When we first started MotherDuck, we heard rumblings of something smaller happening at Google — a ‘LittleQuery’. But Google still has nine products, each with over a billion customers, and when your whole world is larger data, it’s hard to see that there’s another world out there unless someone else shows it to you. As time goes on, it also gets harder and harder to move quickly and get stuff done. Google Cloud is such a huge business now and they don’t seem to be making big bets on new technologies. 

Colin: The incentives of a startup are just so different than they are at a bigger company. Looker might be deployed in an app that has a million users, and that enterprise customer isn’t buying the product because of the risk-seeking appetite of the development team. They’re buying it because it’s very reliable. 

At Looker, we had the same debate: ‘Why can’t we just allow people to touch the SQL?’ It seemed obvious, but once you pulled the thread, you realized every single thing in the app is dependent on not allowing users to touch the SQL. So it's very hard to do a pivot that is structural. At the margin, big companies find great strategies and pivot. I think they can evolve but they can’t reinvent.

“It’s hard for larger companies to allocate resources to new bets, especially when the bet is an attack on the core product philosophy.” Colin Zima, Co-Founder and CEO of Omni

As a large company, almost everything you do also needs to stretch across all of your customers. You can’t do heroic acts for individual customers. But as an early startup, you have to because your product can’t do everything that your users need yet and because you need to make every customer very successful. I think that’s the most fun thing about starting a company. We have customers that may want things we can’t do yet but they see the future in the products we’re building and want to come along for the ride.

How customer problems have (and haven’t) changed #

Colin: The fun thing about being in the same space is that the problem sets are very similar. Companies want to make better decisions, put data into people’s hands, and make every type of user efficient with data. It’s actually great to be in a big crowded space because we don’t need to reinvent the problem — we just need to show customers that we’re a better solution to it. 

Jordan: For us, there are two big problems. One is making it easier to understand your data. A lot of database companies tend to see their purview as: “I get a SQL statement and I return some results.” But I think there are broader questions like: “I have a business problem that I want to answer and the formulation is actually different from SQL query to SQL results.”

One of the reasons we’re focusing on simplicity and ease of use is because customers will come to us and say, “I have this messy CSV file that has all these problems, and I need to make sense of it and then join it with the data in my warehouse.” That ends up being the kind of thing you can spend a lot of time on. We want to make that easy because asking the question is much easier once that first part is done. 

“People are building applications that need to show data. Someone may have 10,000 users, all with non-overlapping data, and they want low-latency access to that data. The existing data warehouse vendors are not set up to handle that use case." Jordan Tigani, Founder and CEO of MotherDuck

Low-latency use cases are something that MotherDuck can do really well. As a scale-up system, we give each of those users a DuckDB instance that turns off when they’re not using it. It’s free for tiny users, and if you have a couple of huge users, we can scale those up to the size of the largest EC2 instances, equivalent to another cloud data warehouse 3XL that would normally cost you $1M a year.

Colin: Jordan wrote something that I loved about the end-to-end nature of customer problems and it articulated something we’ve seen while using DuckDB. We recently released a feature for Excel folks called Calculations, and there’s a function called text which lets you take a piece of data and format it as a string. It’s something you’d use when you’re writing giant ‘if’ statements that mix strings and values.

In the database world, there’s no recognized reason customers would do that. Why would we spit out a column that’s a mix of percentages and words? But it’s been amazing to see the way DuckDB has poked at the edge of what SQL is and what it’s used for in order to get closer to an actual customer problem like: “I have a column and I want it to be a percentage and then put ‘Was not a customer yet’ in this row.”

I feel like databases are always competing over how fast you can return 50 terabytes of data and never over how they can run some median percentile function that can mix between data types in a way that’s convenient. That’s been a fun development for us because those are the problems that we would tackle as a BI provider. When a database addresses it, it gives us an advantage.

Bringing LLMs into the data stack #

“When GPT-4 came out, people thought it was going to put everyone out of their jobs. What’s become obvious since is that text-to-SQL, on its own, is a dead end.” Jordan Tigani, Founder and CEO of MotherDuck

Jordan: There’s so much business logic that lives in the heads of analysts and in the semantic model: the definition of revenue, the difference between two similar tables (one old, one new), data from this product versus that one, etc. If I want to ask questions about that data, I have to be able to interpret it  — especially because one of the reasons you want to use analytics is to make decisions.

So I think that text-to-SQL needs something else. At MotherDuck, we’re trying to give analysts superpowers the way that GitHub Copilot can give developers superpowers. We have a new feature called FixIt. As you’re writing your SQL, FixIt tells you if you have an error in your SQL and can jump right to the line that has the problem and correct it. 

That can change how you’re writing your SQL because there’s a lot of times when I’m writing SQL and I will forget how to do something like timestamp diffs. Is it days or day? Is it quotes or not in quotes? What order should the arguments be in? If you can just write it the way you think it goes instead of jumping over to the docs, you can stay in your workflow. I think there are a lot of opportunities along similar lines. 

Broadly, I think there’s a need for a semantic layer that allows you to ask and answer these types of questions. But I think there’s a bunch of other pieces that are needed before that’s actually viable, and Omni can probably help with some of that. 

Colin: The first and most important use case for us is co-pilot. As I mentioned, we created a feature that uses AI to write Excel calculations. We did the dialect to match the Excel 1:1, and it’s been a perfect use case for an LLM because, as it turns out, the internet is pretty good at writing Excel calcs. The second thing that we learned was that the error use case was just as important. It would write functions but it would be wrong sometimes, so the users needed to be able to make changes. 

“Considering the full experience loop and how the LLM could fail is really important.” Colin Zima, Co-Founder and CEO of Omni

The hard part about black box SQL is that, on the one hand, it’s magical that it can even give you a number. But, if the output of a question is just a number, you’re going to need to be able to follow up because you’re never just asking one question of the data. You need to slice it, go deeper, and you need UI controls to do that.

My view is that LLMs are going to be tucked into products all over the place. Even if all they did was fix error handling or search inside a little component of an app, they can be enormously valuable. For example, our most common support question is “How do I do a thing that the app can already do?” LLMs can start solving those kinds of problems in very tactical ways. 

Competitive advantages in the crowded data space #

Jordan: At MotherDuck, I think we have a unique take on data that’s aligned with the direction the world is moving in terms of how scale-up actually works — that you don’t have to build a scale-out system. User experience is also really important to us. DuckDB is taking the world by storm and we have a special relationship with them. We also have our combined experience as veterans of Snowflake, AWS, GCP, and Databricks, so we know how it’s been done elsewhere and what can be done to make it better. As a startup, we’re nimble. It may be a crowded market, but it’s huge and growing and not winner-take-all, so that leaves a bunch of opportunities for us.

Colin: Our answer is very similar. At Omni, I think we have a unique take in a very crowded market. We’re only two years old, but we’re already pretty different from what already exists, and that was possible because of our experience. 

“The Omni team has been in this space for a really long time, and we not only understand the gaps but also the good things that exist. I think you need to balance doubting what exists with deeply appreciating what’s good about it.” Colin Zima, Co-Founder and CEO of Omni

Another advantage — and this might sound silly — is that I think we have a team that cares more than other companies out there. Customer experience matters, and we want to deliver a better one than everyone else. That’s probably not a sustainable advantage in the long run, but we’re fully leaning into it now. So we have several advantages: we have people that have done it before, we have a problem that we really understand, and we have some customers who we know well and who trust us, and that gives us an opportunity to keep learning.

For the full unscripted interview with Colin and Jordan, watch the video. To learn more about the future of data warehousing, visit MotherDuck's website. To see Omni in action, get a demo.