- info
- Transcript

From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR, this is "Breaking Analysis" with Dave Vellante. Our research and analysis points to a new modern data stack that is emerging, where apps will be built from a coherent set of data elements that can be composed at scale. Demand for these apps will come from organizations that wish to create a digital twin of their businesses, to represent people, places, things, and the activities that connect them to drive new levels of productivity and monetization. Further, we expect Snowflake, at its upcoming conference, will lay out its vision to achieve this. Hello and welcome to this week's "Wiki Bond CUBE Insights" powered by ETR. In this "Breaking Analysis" and ahead of Snowflake Summit later this month, we lay out a likely path for Snowflake to execute on this vision, and we address the milestones and challenges of getting there.
As always, we'll look at the ETR data and what that tells us about the current state of the market, and to do all this, we welcome back CUBE Analyst and contributor George Gilbert. Hey George, good to see you. Thanks for coming on. Hi Dave, good to be here. All right, what we really want to do today is get into how we see the new modern data stack evolving, and that's what this next chart really kicks off. So, we see the world of apps moving from one that is process-centric to one that is data-centric. We've talked about this before with George on theCUBE. This is where business logic is embedded into the data versus the way it works today. And there are four layers to the emerging data stack that are supporting this premise. And we start at the bottom, even though it's at the top of this slide, that's the infrastructure layer, which we believe is increasingly going to be abstracted to hide the underlying cloud and cross-cloud complexities. So make that invisible, what we call Super Cloud. And then as you move up the stack and down this slide is the data layer that comprises multilingual databases and multiple APIs and pluggable storage.
We're going to talk about that today. And then continuing further is a unified service layer, that's the next layer. Brings together BI as well as AI and ML, and then finally the PaaS for Data Apps at the top of the picture, which defines the overall user experience. All right, George, let's get into it. What do we need to know about this emerging stack and how do you see Snowflake's role in evolving it? If there's one big takeaway, I would suggest that we're moving from an era where you built Web 2.0 apps, that's were really microservices and separate databases and then you move that data somewhere down a pipeline and try to collect it and analyze it, to, as you said, something that manages people, places and things. You worry about things, not strings. And more important, there's a stack that unifies the development of these types of apps. Today in the Web 2.0 World, on Amazon, for instance, there's 200 different services that weren't really designed to work together that developers try and cobble together. Snowflake's ambition is to essentially substitute all that complexity with one coherent development environment that manages, as they say, all workloads, all data. And since all applications are data apps, this is the new stack for building applications primarily on Amazon, because that's the ecosystem where the development services are so fragmented. Okay, we'll talk about Snowflake, we believe, is trying to position itself to be the best place to build those apps, even better than any of the clouds, but let's move on to use, introduce the Uber analogy that we've shared before. The idea here is that the future of a digital business will manifest it to a digital twin of an organization. The example we use frequently is Uber for Business, where in the case of Uber, you got drivers, riders, destinations, ETAs, and all the transactions that result from real-time activities that have been built in by Uber to a set of applications where all these data elements are coherent and can be brought together and composed, if you will, to create value in real time. So George, why is this such a powerful metaphor? How does it apply? Well, one, by organizing your application around the people, places, things and activities in the real world, there's not an impedance mismatch between what a developer thinks about and the entities in the business that he's trying to manage. In other words, this new era of applications, you're trying to orchestrate things in the real world and to do that, you have to have coherence across all these different elements, all the people, places and things that gets to the semantic layer. But then underneath that, you need a data management layer that integrates operations and analytics. And so once you have that sort of coherence, then you can have AI orchestrate things like matching drivers and riders, calculating a route, estimating a price, calculating an arrival time. All that stuff can be done autonomously because you're not working with siloed applications for different processes. You've got one coherent layer of digital twins of everything in your business. Okay, let's keep going on to the next one, the pyramid slide here. We think the industry, generally, and Snowflake specifically, are moving to a world beyond today's Web 2.0 programming paradigm, where you have analytics and operational platforms are separate, and the application logic is organized through microservices. We see this evolving. We see a world where these types of systems are integrated and BI is unified with the AI and ML tool chains and a semantic layer organizes the application logic to enable all the data elements to be coherent. George, add some color to this graphic and why does it point to Snowflake's platform strategy, in your view? So the biggest challenge is that you can build Uber type apps, which we think was the prototypical app for the next generation, where you're using analytics to orchestrate things in the real world. The problem is, only companies like Uber, or Google, or Amazon could build those types of apps. You were mucking around in Map produce code, you were slinging tensor flow for building models. What you needed is to bring that capability to mainstream organizations in a packaged platform. And that's essentially what Snowflake and Databricks are trying to do for data applications so that it doesn't require world class developers, it requires only mainstream developers working with a package platform, that's the big- >> Is that what Ford is on this chart? Yes. Okay, explain that. Double click on that for a second. So there were not many people who could build Uber class apps. Like, as we will talk to Uber itself, some of the developers, I think, next week will find some of the incredible efforts that they had to go through to cobble something like that application together. But to get this down to mainstream developers, I mean, we could think of going back, this might be before, this is one historical example, but originally building graphical applications took the skills that only a few people had to work with, like windowing toolkits and deep C systems programming.
And eventually that became something so easy that you could just drag and drop like a user interface layout graphically and just attach code elements to the graphical elements. In other words, it took it from systems programming to citizen programming. And we have to do the same thing here where we have to make a platform as a service that's so accessible that it doesn't take Uber class developers to build an Uber class app. Yeah, that's what Snowflake and Databricks are trying to build and we have details on how we think Snowflake is going to do it. Okay, let's get there. Let's take a look at the three enablers that we just laid out earlier. This is sort of a setup graphic, if you will, and we'll drill into each of these three areas that we're showing here. So the first of all, Snowflake is legit, in our view, when they say they're not a data warehouse. Now, of course the company started as a database player, but they've evolved into a platform. We think the main thrust of that platform is an experience that promises consistency and governed data sharing, even across clouds, and it continues to evolve to support any data type through pluggable storage and being able to extend this promise to materialize views, which implies a wider scope. And the really interesting opportunity is building data apps. And it's clear to us that Snowflake wants to be the number one place in the world to do this, meaning the fastest time to develop, the most cost effective, the most secure, and the most performant place to build and monetize apps, data apps.
George, explain the pluggable views in this chart and the importance of being able to update materialized views and Snowflake's Iceberg play, which is a bit misunderstood. In other words, a lot of people said, "Oh, this is just Snowflake, it's a checkbox to try to compete against Databricks's open source approach." So how do you see this? And take us through the key points on the graphic. Okay, so the core principle is supporting all data types and all workloads. And Snowflake started basically with OLAP data where it's arranged in columns and it's for slicing and dicing. They started to add OLTP data last year with hybrid tables, but they're basically opening up the DBMS in multiple layers, so that there's pluggable storage, so that you could add support for streaming data, for graph data, which is connected data, for vector data, which is the data type that machine learning models sort of speak natively, and by supporting multiple data types, you have this unification so that it provides simplicity for the developer.
And then above that level, an example of simplicity is the materialized views that we're talking about. You could have transactional data coming in and you have OLAP data, which is like historical reference aggregated data, and the materialized view is like a business metric. Like, in Uber they might be looking at how many rides per hour per city, or something like that. And that needs to be updated based on transactional data that's coming in, but it's also historical reference data. And the problem is, when you did metrics in the past, you had to extract, or at least the BI tools extracted the data from the database and it immediately was disconnected from the live data and it became stale. Now, Snowflake, in the database engine, can incrementally support updating those metrics with live data. And that's powerful, 'cause it can do it across data types. So the importance here is you're not moving data to do that, right? You're updating, write it directly into those materialized views, is that correct? That's a core capability of a good platform, is it translates between the things a developer cares about on top and the strings that the database is managing internally and it hides worrying about moving the data, translating the data, caching the data, Snowflake just takes care of that, Even across data types. And if you go back to the chart, Alex, so the pluggable storage, let's explain what we mean by pluggable storage. You're talking about the ability to support all kinds of different database types and formats, correct? Yes. What we don't know is, the level of support for cross-database-type joins and transactions. In other words, we do know that you can join transactional table and an OLAP table and incrementally update the product of that, which would be a materialized view. We don't know the level of support for joins and transactions across these pluggable storage. What we do know though is that the defining feature of the data lake or Lakehouse architecture was that any compute or analytic engine could read and write data without going through this gatekeeper, which in the data warehouse version was the SQL DBMS engine, which imposed a performance tax or even just a API constraint. Now, Iceberg tables are native tables in Snowflake, and so any tool can read and write directly to Iceberg tables without going through the SQL engine, and that's a big change. That's a headline change. Yeah, and as well, again, I said earlier, a lot of people think it's a checkoff item. I don't think... You know, it's pretty clear that Snowflake doesn't want to make this a checkoff item, anything but. They want to make it... The best place to do Iceberg, it will be with Snowflake, otherwise, what's the differentiation? Because if it's in Iceberg then anybody can use it. So there's got to be a hook, a carrot, correct, to entice customers to stay on Snowflake? It's that DBMS execution engine that sits on top of both Iceberg, native OLAP, OLTP, Graph, they're still unifying everything. You can read and write directly to Iceberg tables, but they provide this unifying framework for accessing multiple data types and joining across those multiple data types. That's the value-add, as well as governance across all of that. The key is you do not have to sort of pass Go and pay $200 every time you wanted to get to an Iceberg table anymore. That was the defining feature of the data lake architecture, that you could read and write to the data in the data lake without going through some data warehouse engine. All right, let's go on to the next point here in the fundamental premise of the pillars, which is unifying business intelligence and AI/ML. This is a critical theme of the new modern data stack, if you will, and that's what we want to talk about here, George. The diagram that you shared with me, you're showing Snowflake on the left hand side and Databricks on the right, and you can see these worlds coming together, which is exactly what's happening in the real world. Walk us through the important points of this graphic and what are the implications for Databricks, Snowflake, and the industry in general? Okay, as you'll show in the ETR data later, basically, if an enterprise wanted the best of both worlds, the business intelligence world and the AI/ML world, they needed both Snowflake and Databricks. Both companies are trying to make it possible so that you don't have to choose both, but you can just choose one. Our analysis is about the progress Snowflake has made in supporting all the AI/ML tools and personas that you used to really have to go to Databricks for. The headline here, the biggest headline, I think, is Python pandas support. This is the core of working with data in Python.
And I've heard statistics that there now may be more Python pandas programmers than there are SQL programmers. And the problem with Python support is that if you want to use any of the libraries, especially pandas, it runs on a single core, on a single CPU. And so then when you want to do your exploratory data analysis on large scale data or put a pipeline into production, you had to rewrite it, say in Spark, and their equivalent was Koalas, which at least as of a year ago was like 60 to 70% coverage of the pandas API. So there's a company called Ponder that spun out of the Berkeley Rise lab that did a reimplementation of pandas that can run on scale out compute execution engine.
They've implemented that on Snowflake. We don't know what sort of relationship there is or there will be, but basically you'll be able to take your laptop Python pandas data science and data engineering code and scale it out directly to run on the Snowflake cluster with extremely high compatibility. The numbers we've seen are 90 to 95% compatibility. So you might have this situation where it's more compatible to go from Python on your laptop to Python on Snowflake than Python on Spark. So that's an example of one case where they're taking the data science tools that you used to have to go to Databricks for and supporting them natively on Snowflake. They're- So, Alex, bring up that last slide. I just want to say probably about four or five years ago we said, "Oh, there's this new workload emerging. It's a combination of AWS IaaS plus Snowflake, plus Databricks, to bring to AI/ML. And we said, "Okay, this is the new modern workload." Well, what's happened is, both, actually all three companies are saying, "Hey, we see a huge TAM out there, we're going after this." And so those worlds are coming together, George. Right, and another thing, another analytic persona that's being supported is LLM access. So large language models are a way of querying your data in natural language. ThoughtSpot pioneered this. Based on acquisitions, we can read between the lines that Snowflake is also going to offer LLM access to their data, but not only to the structured data that you would query via SQL, but complex information, whether in documents, images, video that you would embed in vectors that they can also query. So the point is, this distinction between business intelligence and data science and machine learning workloads that really is blurring and that distinction's going away so that either platform can support all the workloads without having to invite the other vendor in. And ultimately we're talking about operationalizing data through data apps. And that is a big change, right? We've operationalized every backend system over time and now we're entering, we think, a new era. Okay, we're going to take a break from George, your awesome graphics, thanks for those, and then come back to the survey data. Let's answer the following question. To what degree do Snowflake and Databricks customers overlap in the same accounts? And this is the power of the ETR platform, where we can answer these questions in over a time series. So this chart here shows what the presence of Databricks is inside of 302 Snowflake accounts within the ETR dataset. The vertical axis is net score or spending momentum and the horizontal axis shows the overlap.
We're plotting Databricks and we also put in Oracle, just for context. So as you see, 36% of those Snowflake accounts are also running Databricks. That jumps to 39%, it's not a huge jump, but it jumps up, if you take Streamlit out of the equation. And notably, this figure is up from 17% two years ago and 14% two years ago if you take out Streamlit. The point is, Databricks presence inside of Snowflake accounts has risen dramatically in the past 24 months. and that's a warning shot to Snowflake, and by the way, Oracle is present in 69% of Snowflake accounts. Now, let's flip the picture. In other words, how penetrated is Snowflake inside of Databricks accounts?
And that number, as you can see here, is 48%, but it's only up slightly from 44% two years ago. So Databricks, despite the growth of Snowflake over the past two years, is more prominent in terms of penetrating Snowflake accounts. So George, what do you make of this data? I think, if I were to distill it, it's that as organizations became more mature with data, they realized they needed both if they really wanted to get the most out of their data. That's why we see increasing overlap. Now both vendors have been trying to address the workloads from the other vendor and the takeaways from our analysis is that Snowflake is making a lot of progress on addressing the data science and engineering workloads that formally were really the stronghold of Databricks. And you know, obviously Databricks is not sitting still, but the main point is, I think if we look over the next 12 months, we're likely to see less overlap. All right, let's move ahead to the third key point of our simplified platform view which brings us deeper to the semantic layer. So this graphic emphasizes the notion of organizing application logic into digital twins of your business. And our assertion is that this fundamentally requires a semantic layer. George, what does that mean for Snowflake and other players like AtScale or DBT or Looker and, of course, even Snowflake? Do they have to own that semantic layer? Can they partner to get that semantic layer? What's your take? So, the semantic layer is starting as business intelligence metrics, things like bookings, billings and revenue or as we were talking about with Uber, like rides per hour. Now, the reason why BI tools in the past extracted data to manage that, the support inside the data platform to translate between the metrics and the data and to keep that metric, the data behind the metric, live and incrementally updated, that was not there. That will be there now. So, what you'll see is companies like AtScale, DBT, Looker, probably not Power BI, they could define the business metrics and store the definitions inside Snowflake, and then Snowflake would serve those metrics and cache the data, which is the hard part that these tools had to essentially extract the data and manage it themselves before.
That support will be there now. Now, this has implications for the full metrics, I'm sorry, the full semantic layer, and let me explain. Snowflake's attitude was they're doing the hard work to support the BI metrics. The tools from third parties define them, but they're doing the hard work of translating that down from things into strings and then caching the data on the way back. That's really hard systems work. Now, it appears they think that they can leave the BI metrics to third parties to manage, so why not the full application semantic layer? The full application semantics is not something as simple as turning things into strings.
That manages activities and there might be API calls to applications. There's a much more sophisticated process mapping and we're not convinced that Snowflake has realized that if they leave that to third parties, then the entire tool chain can essentially hide the data platform under underneath. In other words, they have the mindset of a database management system, but have they adopted the mindset of a full application development tool? Because the application development tool is thinking about things that a developer cares about in the business world, and if they want to own the whole stack, they have to manage that translation between all the things that a developer cares about to all the things that a database manages. It's easy when it's BI metrics. It's not so easy when it's the full application semantics and we're not convinced that they've shown that they're thinking that far ahead. Well, I mean there aren't a lot of historical examples of companies that are doing that. Oracle, not really. IBM is kind of the exception that proves the rule that it's hard. So, but let's double click on this, George, and discuss further what it means for Snowflake in terms of who owns the semantic layer and how to translate the language of things into the language of databases, which you call the language of strings. Explain these three layers here and the implications. So if you want to organize your application logic in digital twins that correspond to things in the real world and activities in the real world, you're no longer just translating, say, bookings, billings, and revenues into a database entity like a materialized view. Instead, you might be orchestrating calls to SAP or some other operational app that might match drivers and riders and you have to wait for the different apps to signal that they've completed their activities. This would be analogous, historically, to mapping between let's say the Java application server and the underlying relational database or the object relational mapping layer, which mapped the application objects down to relational tables.
But this is even more sophisticated and no one in the past really managed that whole stack to the point where, by translating all the way from the application entities down to the database entities, did they make it so seamless for the developer that this stack was something that could not be cobbled together. In other words, it was not seamless. The layers were visible to the developer. If the layers are visible to the developer, that means some other development tool stack can abstract away Snowflake. Snowflake's challenge is to make that so seamless that you want to live in their development environment and you don't want to move anywhere else because otherwise you would be exposed to that complexity.
And that's where we want to see what they have to say about that roadmap at the summit. >> Okay, so the point being, if somebody else owns the semantic layer, then somebody will be able to more easily extract that data, replicate potentially what Snowflake's doing, and that suggests that Snowflake may, over time, want to vertically integrate that, although there's no indication thus far that that's the case. All right, let's close by examining the horses on the track. Belmont Stakes coming up this weekend. It's a grueling long mile-and-a-half race. It's not a sprint. And you know, we're looking at this graph at the marathon runners in the world of cloud data platforms. So same dimensions as before, net score or spending momentum, by N on the horizontal access or overlap in the 1,171 cloud accounts.
Overall, there's 1700 respondents in this last survey from ETR. We're now filtering that by accounts that, say, were cloud accounts, that's 1,171, and that red line of 40% indicates a highly elevated net score. George, Microsoft just announced Fabric at Microsoft Build. AWS, they're gluing together it's various data platforms. Google's got a really strong product in BigQuery with perhaps the best AI in the business. Comment on Microsoft, your thoughts on Fabric. You just mentioned probably not Power BI. AWS, GCP and their relative position to where we think Snowflake is going and let's riff on that. Okay, so the big change is we're trying to unify first the data so that, first of all, your analytic data is in one place. There's one source of truth for analytic data. Then one source of truth includes all your operational data, then one uniform engine for accessing all that data, and then that unified application stack that maps people, places, things and activities to that one source of truth. So here's where everyone is. Snowflake is unifying the analytic and operational data and operational data in all its forms. What they've done is put it beneath one DBMS engine with multiple personalities.
Then what Microsoft has done is they've standardized all their analytic data on the Databricks data format. So all Microsoft analytic engines and all Databricks analytic engines can talk to the same data. Now, they have not unified all the analytic engines, but it's all one data repository. And in the Microsoft ecosystem, it's powerful that at least there's one source of truth at the storage layer. What they have not done is integrate operational data. You have to sort of synchronize that through essentially same change data capture. So there's some latency getting operational data and if you want to execute transactions on this data, you go to a operational or transactional database.
So they've unified the analytic data. In the case of Amazon, they really still have the data lake and the data warehouse, and you can talk from one to the other, but it's kind of through connectors and the data lake data is really not native for the data warehouse. It's sort of like treated like the external data platform, Google Cloud platform, while you have the full separation of compute and storage and you can put all the data you want in Google BigQuery. It's not the same as a data lake.
If you have data lake data, you're querying that as if it were external data. In other words, they have not unified all their analytic data platforms in a single format, nor all the analytic data engines, but they're working on it and I assume we'll see a lot of progress towards that at Google Cloud Next, in August. So Microsoft and Databricks have made progress in that they're teaming up and saying, "Standardize on the Delta table format and all analytic engines that either company will offer, will use that data." The next step would be to integrate operational data. I think that's further off for them because the Delta table format is just not conducive for operational data.
So Snowflake, we were saying when we did the Databricks show a couple months ago, Snowflake was ahead, in that, five years earlier than everyone else, they started building a cloud native analytic engine. They've generalized that to include the operational data and not just transactional data, but data in multiple formats. So they're very far ahead in providing a platform for building data apps. What they have yet to do then is that next layer where they map the semantics of people, places, and things in the real world to any data format that they're managing. All right, and Alex, bring that last slide up, if you would? And then the last point was on Oracle. You can see they don't have the momentum here, but a couple points here, one is, every company that we're showing here, with the exception of Oracle, is above that 40% line or just at it. The second thing is they all have ML and AI chops. Oracle, of course, is going to tell you, with MySQL HeatWave and the converge capabilities of Oracle database that they're already there. George, what would you say to that? You are skeptical about Oracle competing in this space, or is this Oracle on Oracle, is what their game is? Oracle is kind of like the IBM mainframe, which is, it's the repository for mission-critical operational and analytic data that's mostly on-prem, that the Oracle database was never really rearchitected to be cloud native. So that the great big appliances that gave you that huge scalability, which are essentially mainframes, that's how they provide scalable Oracle access in the cloud, is they put those appliances in the cloud. So you're saying AWS is not running Exadata in their cloud? All right. (chuckles) You get the point. We're up against the clock, George, so let's close with the key questions they're going to be watching for at Snowflake Summit and of course Databricks events, which take place the same week in late June. Last week of June. We're going to start at the bottom layer of the stack and work our way up, even though it's down on this slide. I sort of was referring to that earlier, but I got it backwards. This is really a slide I was talking about. Still on the red-eye here. The infrastructure layer; we're talking about managing data outside of the cloud. That's a big question mark that we have for Snowflake and doing, well, whether it's distributed joins, if the egress fees were a non-factor, would that be something that Snowflake would do or consider? The software layer, which is the pluggable, kind of moving to the next layer here, the pluggable storage engines.
When is that actually going to occur? The support for materialized views and cross-engine transactions. What's the detail there and when is that going to happen? The unified service layer, which APIs are we going to support when, how about DBT, Looker, AtScale, others that define metrics, are they going to do that inside of Snowflake? And what about the semantic layer? We talked about that a lot. And what about large language models in AI? What impact will that have? Snowflake's made some acquisitions lately. We're sure they're going to talk some of that up. And then the PaaS for Data Apps, really more than a layer. Really, it's the user experience. How will Snowflake's app store deal with discovery and monetization? How's that all going to evolve? George, kind of a speed date there on the key issues, but give us your final thoughts. This is really a new era for the cloud. We've had like a Unix ethos, especially on the Amazon ecosystem, where it was couple hundred services like tools in Unix and we had pipes to connect them, that the burden for assembling all this was on the developer. And I think what's happening is Snowflake wants to come in now and say, "Yes, Amazon infrastructure's great, but if you really want to avoid all the complexity of stitching together 200 services, we'll give you a coherent stack and tool and the convenience and simplicity of a PaaS, but without the compromises in the past that we saw on the functionality of a PaaS." So it's really, for the first time, we're going to see someone say, "Take Amazon for all their wonderful infrastructure, but do all your development within our framework and we'll simplify it for you." That's their pitch. Yeah, and it's really interesting right? Amazon continues to add function that its ecosystem has, and they've always been sort of transparent about it. Say, "Hey, the ecosystem has to move faster and add more value because we're going to do what the customer wants." They write the customer obsession kind of thing. It appears that Snowflake continues to be ahead of the game there, George. I want to thank you for your time. It's great, always, collaborating with you and more to come, this is the second in the series. George and I will be up next week with Uber, digging in, using that as we've used that example many times, so we're going to actually look at that example and how Uber basically, in 2015, rewrote its entire stack and created this amazing capability. George, thanks for coming on. Thanks, Dave. All right, many thanks to Alex Morrison who's on production and manages the podcast as well. Ken Schiffman, Kristen Martin and Cheryl Knight, shout out to them, they help get the word out on social media and in our newsletters. And Rob Hof is our editor-in-chief over at siliconangle.com. He comes up with some great editing and better headlines than I have. Remember, all these episodes are available as podcasts wherever you listen. You just got to search for "Breaking Analysis" podcast. I publish each week on wikibon.com, at siliconangle.com, and if you want to get in touch, you can email me at david.vellante@siliconangle.com. You can DM me @DVellante, or comment on my LinkedIn posts. Please do check out ETR.ai for the best survey data in the enterprise tech business. This is Dave Vellante, for George Gilbert and "theCube Insights" powered by ETR. Thanks for watching, everybody, and we'll see you next time on "Breaking Analysis". (bright digital music)