Name: Impact of Apache Iceberg - Sanjeev Mohan's take
Uploaded: 2022-06-17T13:02:15.757Z
Duration: P0Y0M0DT0H1M14.000S

SiliconANGLE theCUBE

info
Transcript

Impact of Apache Iceberg - Sanjeev Mohan's take

Clip Duration 01:14 / June 17, 2022

theCUBE Insights with Industry Analysts | Snowflake Summit 2022

Video Duration: 18:55

search

(bright upbeat music) Okay, we're back at Caesar Forum, the Snowflake Summit 2022. theCUBE's continuous coverage, this is day two, wall to wall coverage. We're so excited to have the analyst panel here. Some of my colleagues that we've done a number, we've probably seen some power panels that we've done. David Menninger is here. He's the Senior Vice President and Research Director at Ventana research. To his left is Tony Baer, Principal at dbInsight. And in the co-host seat, Sanjeev Mohan, Sanjmo. Guys, thanks so much for coming on. I'm glad-- Thank you. Glad be here. Thanks for having us. You're very welcome. I wasn't able to attend the analyst action, because I've been doing this all day, every day, but let me start with you, Dave, what have you seen that's kind of interested you? Pluses, minuses, concerns? Sure. What's going on? Well, how about if I focus on what's, I think valuable to the customers of Snowflake. Great. And our research shows that the majority of organizations, the majority of people, do not have access to analytics. And so a couple of the things they've announced, I think, address those or help to address those issues very directly. So Snowpark, and support for Python, and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications, you know, as part of the marketplace is another way to get applications into people's hands, rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job. Dave V: Yeah. And they need analytics in that job. So Tony, I thank you. I've heard a lot of data mesh talk this week. It's kind of funny. I can't seem to get away from it. Yeah. I can't see, it seems to be gathering momentum, but, what have you seen that's been interesting and-- Well, what I have noticed is, and unfortunately, you know, because the rooms are too small, you just can't get into the data mesh sessions. So there's a lot of interest in it. It's still very, I don't think there's very much understanding of it, but I think the idea that you can put all the data in one place, which, you know, to me, Snowflake seems to be kind of, sort of, in a way, it sounds like almost like the enterprise data warehouse, you know, clouded and cloud native edition, you know, bring it all in one place again, I think it's providing sort of, you know, it's, I think for these folks, the thing, this might be kind like a linchpin for that. I think there are several other things that, that actually, that really have made a bigger impression on me actually at this event. One basically is, pretty much their move with Unistore. And it's kind of interesting coming, you know, coming from MongoDB last week week. Dave V: Right. And I see, it's like these two companies seem to be going converging towards the same place at different speeds. I think Snowflake is going to get there faster than Mongo for a number of different reasons. But I see like a number of common threads here. I mean, one is that Mongo was, you know, was a company it's always been towards developers. They need to, you know, start cultivating data people. These guys are going the other way. Exactly, bingo. And the thing is that, but they, I think where they're converging is the idea of operational analytics, and trying to serve all constituencies. The other thing which you, which also is in terms of serving, you know, multiple constituencies is how, you know, how Snowflake has laid out Snowpark. And what I'm finding is like, there's an interesting dichotomy on. One hand, you have this, you know, very ingrained integration of Anaconda, which I think is pretty ingenious. On the other hand you speak, let's say to like, let's say the data robot folks that say, you know, something our folks want to work, you know, data scientists, we want to work in our environment and use Snowflake in the background. So I see those as kind of some interesting sort of crosscutting trends. Yeah. So Sanjeev, I mean, Frank Slootman will talk about, there's definitely benefits into going into the walled garden. Sanjeev: Yeah. And I don't think we dispute that, but we see them making moves and adding more and more open source capabilities like Apache Iceberg. Is that a, is that a move to sort of counteract the narrative that the Databricks has put out there? Is that customer driven? What's your take on that? I primarily, I think it is to counteract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started, but it's usually beneficial to the customers, to the users, because now if you have large amounts of data in parquet files, you can leave it on S3, but then you, using the, the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the, you know, the micro partitioning, you get the metadata. So, in a single query, you can join, you can do select from a Snowflake table, Union and select from a Iceberg table and, and you can do store procedure, user defined function.

So I, think they, what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I am running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. Right. There's hence the Delta. And maybe that, maybe that closes over time. I want to ask you, as you look around this, I mean, the ecosystem's pretty vibrant. I mean, it reminds me of like Reinvent in 2014, you know, but then I'm struck by the complexity of the last big data era and Hadoop and all the different tools. And is this different or is it the sort of same wine, new bottle? Do you guys have any thoughts on that? Well, I think it's different. How so? And I'll tell you why I think it's different. Because it's based around SQL. So if, back to Tony's point, these vendors are coming at this from different angles, right? You've got data warehouse vendors, and you've got data lake vendors, and they're all going to meet in the middle. So in your case, you're talking about operational and analytical, but same thing is true with data lake and data warehouse and Snowflake no longer wants to be known as a data warehouse. They're a data cloud. And our research, again, I like to base everything off of the research. I love when you bring your research. Our research shows that organization, two thirds of organizations have SQL skills, and one third have big data skills. So, you know, they're going to meet in the middle, but it sure is a lot easier to bring along those people who know SQL already to that midpoint, than it is to bring big data people to that midpoint. I remember Amr Awadallah, who was one of the founders of Cloudera, said to me one time, with John Furrier on theCUBE, that SQL is the killer app for Hadoop. Yeah. Yeah. The difference with this, you know, with Snowflake is that you don't have to worry about taming the zoo animals. Right. They really have thought out the ease of use. You know, I mean, they've thought about, I mean, from the get go, they've thought of tooth in two poles, one is ease of use, and the other is scale. And they've had, and that's basically a very, you know, that I think very much differentiates it. I mean, Hadoop had the scale, but it didn't have the ease of use. But don't I still need, like, if I have, you know, governance from this vendor or, you know, data prep from, you know, don't I still have to have expertise that's sort of distributed in those, those worlds? Right? I mean, go ahead, Sanjeev. Yeah, so I, the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking. But most organizations have more than Snowflake. So what, what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability, which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that, that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake-- Panelist: Bingo. but they are at a level higher. So if you have a data lake and a cloud data warehouse, and you have other like relational databases, you can run these cross-platform capabilities in that layer. So, so that way, you know, Snowflake's done a great job of enabling that ecosystem. How about the Streamlit acquisition? Ah. Did you see anything here that indicated they're making strong progress there? Are you excited about that? Are you skeptical? Go ahead. Well, put this way. I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically, that are very, you know, very comfortable with Tableau, but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, I think part of it, this kind of plays into it, is what makes this different from the Hadoop era is the fact that this, all these capabilities, you know, a lot of vendors are taking it very seriously to make, you know, put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the stream lake capabilities are going to be native. You know, the other thing too, about the Hadoop ecosystem is Cloudera had to help fund all those different projects and got really, really spread thin. I want to ask you guys about the supercloud. We use supercloud as this sort of metaphor for the next wave of cloud. You've got infrastructure, AWS, Azure, Google. It's not multi-cloud, but you've got that infrastructure. You're building a layer on top of it that hides the underlying complexities of the primitives and the APIs. And you're adding new value, in this case, the data cloud or super data cloud. And now what we're seeing now is that Snowflake putting forth the notion that they're adding a super paths layer. So you can now build applications that you can monetize, which to me is kind of exciting. It makes this platform even less discretionary. We had a lot of talk on Wall Street about discretionary spending, and that's not discretionary if you're monetizing it. Yeah. What do you guys think about that? Is this something that's, that's real? Is it just a figment of my imagination, or do you see a different wave coming? Any thoughts on that? So in effect, they're trying to become a data operating system. Yeah. Right? Yes. And I, I think that's wonderful. It's ambitious. I think they'll experience some success with that. As I said, applications are important. That's a great way to deliver information. You can monetize them. So there, you know, there's a good economic model around it. I think they will still struggle, however, with bringing everything together onto one platform. That's always the challenge. Can you become the platform? That's hard to predict, you know? I think this is pretty exciting, right? A lot of energy, a lot of, a large ecosystem, there is a network effect already. Can they succeed in being the only place where data exists? You know, I think that's going to be a challenge. Yeah. I mean, the fact is, I mean, this is a classic, you know, best of breed versus the umbrella play. The, and the thing is, this is nothing new. I mean, this is like the, you know, in the old days with enterprise applications where basically Oracle and SAP vacuumed up all these, you know, all these applications, you know, in their, in their ecosystem. Whereas with Snowflake is, and if you look at the cloud folks, the hyperscale is they're building out their own, you know, portfolios as well. Some are, you know, some hyper skills are more partner-friendly than others. What Snowflake is saying is that we're going to give all of you folks, who basically are competing against the hyper skills in various areas like data catalog, and pipelines, and all that sort of wonderful stuff, we'll make you basically, you know, you know, all equal citizens. You know, the burden is on you to basically to, we will lay out the APIs. We'll allow you to basically, you know, integrate natively to us so you can provide as good an experience. But you know, the onus is on your back. Should the ecosystem be concerned as they were back to Reinvent 2014, that Amazon was going to nibble away at them, or is it different? I find what they're doing is different. For example, data sharing, they were the first ones out the door with data sharing at a large scale. Panelist: Yes. And then everybody else jumped in and said, "oh, we also do data sharing." All the hyperscalers came in, but now what Snowflake has done is they've taken it to the next level. Now they're saying, it's not just data sharing, it's app sharing. And not only app sharing, you can, the Streamlit thing, you can build, test, deploy, and then monetize it, make it discoverable through, you know, through your marketplace. You can monetize it. Yes. Yeah. So I think what they're doing is they are taking it a step further than what hyperscalers are doing. And because it's like what Dave said, is becoming like the data operating system. You log in and you have all of these different functionalities. You can do in machine learning now. You can do data quality, you can do data preparation, and you can do monetization data. Who do you think is Snowflake's biggest competitor? What do you guys think? It's a hard question. Isn't it? It is. Yes You're like, hmm. Cause we all get the, oh, we separate computer from storage. We have a cloud data, and you go, yeah, okay, that's nice, but there's-- I'll take a crack. I think that-- There's uniqueness. I mean, put it this way in the old days, it would've been, you know, the OnPrem household names. I think today's the hyperscalers, and the idea with the, I mean, again, this comes down to the best of breed versus buy, you know, get it all from one source. So where's your comfort level? So I think their coopetition are the hyperscalers. Okay, so it's not Databricks cause what, they're smaller? Well, there is some, you know, okay, now within the best of breed area, yes, there is competition. The obvious is Databricks coming up from a data engineering angle, you know, basically, you know, Snowflake coming from, you know, from the, from the data analyst angle. I think what, you know, another potential competitor, and I think Snowflake basically, you know, has admitted as such, potentially is MongoDB. I was just going to say, yeah. I mean-- Exactly. So I mean, yes, there are two different levels of competition-- They're sort of on a, on a longer term collision course. Exactly, exactly. Sort of service now and in Salesforce, you know? The funny thing though is the reaction I get when I say that, and lot of people just laugh, and it's like, "no, you're kidding. There's no way." And I said, "excuse me?" Well, Take a look. But then you see Mongo last week, adding some analytics capabilities, and always been developers, as you say, and-- They trash sequel, but yet they finally have started to write their first real sequel. We have MQL. Now we have sequel, so-- What were those numbers, Dave? (panelists laugh) Two thirds and one third. So the hyperscalers, but the hyperscalers, are you going to trust your hyperscalers to do your cross-cloud? I mean maybe Google, maybe. I mean Microsoft, perhaps. AWS not there yet, right? I mean, how important is cross-cloud, multi-cloud, supercloud, whatever you want to call it? Well, What does your data show? Cross-cloud is important. If I remember correctly, our research shows that three quarters of organizations are operating in the cloud, and 52% are operating across more than one cloud. So, you know, two thirds of the organizations that are in the cloud are doing multi-cloud. So that's pretty significant. And now they may be operating across clouds for different reasons. Maybe one application runs in one cloud provider, and another application runs another cloud provider. But I do think organizations want that leverage over the hyperscalers, right? They want, they want to be able to tell the hyperscaler, "I'm going to move my workloads over here if you don't, you know, give us a better rate." Yeah. I mean, I think, you know, from a database standpoint, I think you're right. I mean, they're competing against some really well-funded-- Right. And you look at BigQuery, really, you know, solid platform. Redshift, for all its fault, has really done an amazing job of moving forward. But to David's point, you know, those, to me anyway, those hyperscalers aren't going to solve that cross-cloud problem. Right, right. No, I mean-- Certainly not as quickly. No, no. Or with as much zeal. Right. We'll operate cross-cloud, but we're going to operate better on our cloud. Exactly. Yes, yes. Even when we talk about multi-cloud, the many, many definitions, like-- Yep. Like you, you know, Panelist: Yes, yes. Multi-cloud can mean anything. So the way Snowflake does multi-cloud and the way MongoDB do are very different. Panelist: Right. So Snowflake says we run on all the hyperscalers, but you have to replicate your data. What MongoDB is claiming is that one cluster can have nodes in multiple different clouds. That is-- Panelist: Right. You know, quite something. Yeah. Right. I mean, you, you, again, you've hit this, we got to go, but last question. Snowflake, undervalued, overvalued, or just about right? In the stock market or in-- Yeah of course Or in customers' minds? Yeah. Yeah. Well, but I, you know, I'm not sure that's the right question. Well, but no, that's the question I'm asking. No, I'll say the question is, undervalued or overvalued for customers? Right? That's really what matters. Dave V: Yeah. There's a different audience who cares about the investor side. Yeah, some of those are watching. But I believe, I believe that the, that from the customer's perspective, it's probably valued about right. Because the reason I ask it is because it has so hyped. Yes. It had a hundred billion dollar value. It surpassed ServiceNow's value-- Panelist: Right, right. Which is just crazy for this. Now it's obviously come back quite a bit. It's below its IPO price. So, but you guys are at the financial analyst meeting. Yeah. Scarpelli laid out the 2029 projections signed up for 10 billion dollars, 25% free cash-- Panelist: Conquer the world. 20% operating profit-- Yeah. I mean, they better be worth more than they are today if they do that. Yeah. If I see the momentum here this week, I think they're undervalued. But before this week I probably would've thought they are at the right valuation. I would say they're probably more at the right valuation only because the IPL valuation was just-- Silly. Such a false valuation. So hyped. Yeah. Guys, I could go on for another 45 minutes. Thanks so much, David, Tony, Sanjeev. Always great to have you on. Likewise. We'll have you back, for sure. Thanks for having us. Good to be here, yep. Thank you. Keep it right there. We're wrapping up day two on theCUBE at Snowflake Summit 2022. Be right back. (gentle music)