SiliconANGLE theCUBESiliconANGLE theCUBE
  • info
  • Transcript
Clip #7. Yoav explains where hackers will start and the most vulnerable aspects of Snowflake
Clip Duration 00:40 / January 29, 2022
Breaking Analysis: Securing Snowflake
Video Duration: 28:34
search

(bright music) The challenges of legacy data warehouses and traditional business intelligence systems, they've been well-documented. They're built on rigid infrastructure, and they're managed by really specialized gatekeepers. Data warehouses of the past were, as one financial customer once said to me, like a snake swallowing a basketball, imagine that. The amount of data ingested into a data warehouse has just overwhelmed the system. Every time Intel came out with a new microprocessor, practitioners, they would chase the chip in an effort to try to compress the overly restrictive elapsed time to insights, and this cycle repeated itself for decades. Cloud data warehouses, generally, and Snowflake, specifically, changed all this. Not only were resources virtually infinite, but the ability to separate, compute from storage, it actually turned off the compute when you weren't using it, permanently altered the cost, the performance, the scale and the value equation.

But as data makes its way into the cloud and is increasingly democratized as a shared resource across clouds and at the edge, practitioners have to bring Sec DevOps mindsets to securing their cloud data warehouses. Hello, and welcome to this week's Wikibon, "theCUBE Insights," powered by ETR. In this "Breaking Analysis," we take a closer look at the fundamentals of securing Snowflake and to do so, we welcome two guests into the program. Ben Herzberg is an experienced hacker and developer and an expert in several aspects of data security. He's currently working as the Chief Data Scientist at Satori, and he's joined by his colleague, Yoav Cohen, who is a technology visionary, and currently serving as CTO at Satori Cyber. Gentlemen, welcome to "theCUBE," great to see you. Great to be here. >> Thanks for having us, Dave. Now, these two individuals have co-authored a book on Snowflake Security. It's a comprehensive guide to what you need to know as a data practitioner using Snowflake. So guys, congratulations on the book. It's really detailed, packed with great information, best practices and practical advice and insights all in one place, so really good work. So, before we get into the discussion, I want to share some ETR survey data just to set the context. We're seeing cybersecurity and data, they're colliding in a really important way. And here's some data points that we've shared before from ETR's latest drill down survey. They asked more than 1200 respondents. We're talking CIOs, CSOs and IT professionals, "Which organizational priorities "will be most important in 2022?" And these were the top seven. There were a lot of others, but these were the most important.

So, it's no surprise that security is number one, although, as we shared in our predictions post, the magnitude of its relative importance, it does vary by the degree of expertise within the organization. The Delta is maybe not as significant, for example, in large companies, and you can see where analytics and data fit. And we've tied these two domains together and picked up on a term that our two guests have used, in fact, you guys may have even coined it, called DataSecOps, which, to me, is the idea that you bring Agile DevOps practices to data operations and built-in security as part of the full cycle of managing, creating the data, using the data, accessing the data, not a bolt on, but it's fundamental, so guys, what do you make of this data, and what's your point of view on DataSecOps? So, definitely aligns with what we're seeing on the ground in the market. In between what you saw there, you had cybersecurity and data warehousing. In the middle you had cloud migration, and that's basically what's pushing companies to invest in both security and data and warehousing, because the cloud changed the game for cybersecurity. The tools that we use before are not the same tools that we need to use now. And also, it unlocks a lot of performance value and capabilities around data warehousing. So, all of that comes together to a big trend in the industry for investment, for replacement, and definitely we're seeing that on the Snowflake platform, which is doing really, really well recently. Yeah, well thank you, Yoav. And to that point, I want to share another data point and then dive in, maybe Ben, you can comment. And I want to address, why are we always talking about Snowflake? Of course, it's a hot company. Everybody knows that. You can see it in the company's financials, but the ETR survey data tells a really compelling story about the company. Here's a chart from the most recent ETR January survey. And so, you can see at the, at the top, that blue line, it represents net score or spending momentum, and the darker line at the bottom represents presence or pervasiveness in the survey sample. Just a background, there are 165 Snowflake customers that responded to this past survey. 10% of companies within the Fortune 500 were in the sample, and around 4% of Global 2000 companies participated.

Just under 30% of the respondents were C-Suite executives, and about 20% were analysts or engineers or data specialist with around half were VP, director, manager roles that fat middle, with a very broad mix of industries, and there was a bias toward larger companies. Now, back to the chart, that net score for a moment, is that top line, is derived by asking customers, "Are you adopting Snowflake new in 2022?" That's the 27% lime green number. "Will you be spending 6% or more on Snowflake, "relative to 2021?" That's the 57% forest green. "Is your spending flat?" That's the gray. "Is it down by 6% or worse?" That's the other, the pink area. "Are you leaving the platform?" That's the bright red, and that's a zero defection, so there's none there.

So you subtract the reds from the greens, and you get net score, which calculates out to 83% in his pet survey. But what's remarkable is that Snowflake has held this elevated score for more than 12 quarterly surveys. It's in the stratosphere among the many thousands and thousands of companies in the ETR survey. Remember, anything above that 40% line is elevated and Snowflake is like glued to the ceiling. So the bottom line shows that the company's market presence continues to grow, that darker line at the bottom, and that green shade shows us that the pace of last quarter is actually accelerating. Snowflake is becoming ubiquitous, and customers are becoming intimately familiar with its platform, and it's scaling like we've never seen before, and it's building a pretty hard to penetrate fortress, we think, and an ecosystem. Ben, I wonder, in your view, what accounts for Snowflake's performance? Okay, so I would say that we can spend a full session just about such thing, so I'll try to say what I think. I think, first of all, it does what it says on the box. You get from zero to being able to have a data warehouse easily, you have a very rich support of capability and features that you need for a cloud data warehouse. Your multi-cloud, you're not dependent on one of the big public clouds, and it's fast and scalable, and you don't need to worry yourself with the infrastructure behind. You don't need to, God-forbid, add any indexes or do things like that.

You don't need to do that, at least not often, indexes never, but other maintenance. And the innovation rate, they innovate fast. They add a lot of new capabilities, like the move to unstructured data, like a lot of security and governance capabilities, high innovation rate as well. Okay, good, and we'll talk about that move. So let's get deeper into the topic now on securing Snowflake. My first question is look, Snowflake, when you talk to practitioners and customers, they get pretty high marks on security, largely because of the simplicity, so why did you feel the need to write a book on the subject? So, definitely Snowflake is investing a lot of effort and putting a lot of emphasis on security. However, it's connected to the cloud service, and like any other cloud service, there is a shared responsibility model between Snowflake and its customers when it comes to fully securing their data cloud. So Snowflake can build amazing features, but then customers have to really adopt them, implement them in the best way. One of the things that we've seen by working with Snowflake customers is that we typically interact with data engineers, but then they have to implement security features and security capability. We thought writing a book about the topic would help these customers to understand the features better, benefit from them better and really structure their implementation and decide what's most important to implement at every step of their journey. Yeah, and I think that when I was researching this topic, I could find a lot of good information on the web, but I kind of had to hunt and peck for it. It was really sort of dispersed, and you put the information all in one place. You have a nice table of contents, so I can just zip right to where I want to go, so that was quite useful, I thought. What are the very basic fundamentals of securing Snowflake? In other words, I'm interested in, you get this world of flexible, it's globally distributed. You get democratizing data. How do you really make sure that only those folks that should have access, do have access? I mean really, let's talk about that a little bit. Oh, I think that, of course there are a lot of different aspects, but I think that I would start with the big blocks. For example, when you get a Snowflake account out of the box, it's open to the world in terms of network. I would start by limiting that. That should be easy for an organization. It's a couple of commands, and you've lowered your risk significantly, both security and compliance. Then, one of the common things that you can get a good improvement in a decrease of your risk is around those indications. For example, do you have applications that are accessing Snowflake using user password? Okay, change that to using a key.

Do you have users with username, password? Change that to Okta integration or your IDP integration. So I would start with the big blocks that can remove most of my risk, and then of course, there is a lot to do from getting to the data warehouse and to auditing and monitoring. Okay, thank you for that. But, Yoav, how are these fundamentals that we just heard from Ben, how are they different? Isn't this kind of common sense? What's unique about Snowflake? So, a couple things, first of all, security, we love to say that it's 80% good security hygiene. You have to make sure that your basics are locked and tightly configured and that brings a lot of value. But two points to consider, first of all, all of these types of controls are pretty static in the sense that once you get in, you get in, and then you have pretty broad access, and we'll talk about authorization concepts and everything, perhaps today, but these are really static gatekeepers around your data. Once you have access, then it's really free for all. When you compare it to other types of environments and what we're seeing in other domains, maybe a move to more dynamic type of controls, elevated access or elevated additional authentication steps before you get elevated access. And what we're thinking is that beyond those static controls, the market is going to move towards implementing more dynamic, more fine-grain control, especially because in Snowflake, but any other data warehouse or large-scale data store, which becomes an aggregation point of data in the company, and we work with really big companies, and they bring in data from multiple jurisdiction from across the world, so they can get an overview of the business and run the business in a much more efficient way, but that really creates a pressure point when it comes to securing that data. Okay, Ben, you touched on this a little bit. I want to kind of dig deeper. So, Snowflake takes a layered approach, of course, it's sensible, and the layers, network, which talked about identity, access and encryption. and so, with any cloud, as you guys mentioned, it's a shared responsibility model. So I want to break that down a bit, and let's start with the network. So my responsibility, as a customer, I'm going to be responsible to set up the DNS. How much public internet access am I going to have for other users and apps. So how should practitioners think about their end of the bargain on the network? What do they need to know? At the network level, as I mentioned before, a new account is open network-wise, it's open to the world. And one of the first thing I would do would be to set a network policy on the account to limit network access to that account. And of course, in many organizations, you would want to configure that with private link to your cloud environment, but that would be step two. (laughs) First step is simply set the network policy to make sure that it's not open to the public. Yeah, and that seems pretty straightforward, but let's talk about identity, 'cause it feels like that's where it starts to get tricky. You got to worry about setting up roles and managing users. You could even configure row and column base access, as I understand it, and I imagine access is where it really gets confusing for a lot of people, especially when you're crossing domain identities. Like for example, isn't a role-based security, let's land on that for a minute, I think you called it hierarchy hell in the book, so what should we think about in regards to identity? Well, first of all, it's hierarchy hell, in the book, it says that you can use hierarchy, but you should avoid getting to a hierarchy hell. Basically, we've seen that with several Snowflake customers where the ability to set roles in a hierarchy model, to set a role that inherits privileges from another role, that inherits privileges from other roles and maybe, of course, used in a good way, but it also in some of the cases, it leads to complexities and to access not being deterministic, at least not obvious to the person who gives access, who is usually the data engineer. So, whenever you start having a complex authorization model, whenever I want to give Yoav access to a certain data set, and because things are complex, I also, by mistake, give him access to the salary information of the company, that's when things become tricky. If your roles are messy and complex, then it may lead to data exposure within the organization or outside the organization. How do you find Snowflake's integrations? Like if I want to use Okta or I want to use a CyberArk, I mean, how would you grade them on their ability to integrate with popular third party platforms? So, I would say pretty high, actually. We haven't encountered many customers who haven't configured any of these... nowadays, really basic security integration, and it really, really helps, setting that good identity management foundation for the platform. So they're investing a lot in that area, and we've been following them for a couple of years now, and it's really, really coming along nicely. All right, let's talk about encryption. I mean, that seemed pretty straightforward. Correct me if I'm wrong. I think Snowflake auto rotates the keys every 30 days. It really seems like your responsibility there is monitoring, making sure you're in compliance. You got good log data or access to good log data. Is that right? So, this really depends. So, for the average company, I would say, yes. For some of the companies with higher security requirements or compliance requirements or both, sometimes there are issues like companies that do not want to have the data stored in clear text, in Snowflake, even encrypted as in the data warehouse encryption or the account encryption, even if someone accidentally gets access to the table, they want them not to be able to pull the data in clear text, and then it gets slightly more complicated. You have different ways of tackling this, but for the average company or companies who do not have such requirements, then everything in Snowflake is encrypted in transit and addressed, and of course, there are more advanced features for higher requirements. Okay, I'm interested in what you guys think of some of the more vulnerable aspects that Snowflake customers should really be aware of. Imagine I'm saying, "Guys, let's run a pen test. "Okay, make sure I have no open chest wounds, "but really try to fool me." What would you attack? Where should I be extra cautious? So, I would start with where data resides. And, if you look at the Snowflake architecture, there's a separation between storage and compute, but that also means storage is accessible without going through the compute. That can create opportunities for hackers to go and try and find access where access shouldn't be had. That's where I would focus on. I want to ask you about Virtual Private Snowflake. It seems to me, if I have sensitive data, if I don't use Virtual Private Snowflake, I feel like I'm increasing my risk that a security incident at the shared cloud services layer could impact multiple customers, and is this a valid concern? How should we think about reducing that risk, and when should I use that higher level of security? So, I think first of all, to the best of my knowledge, I'm not a Snowflake employee, but to the best of my knowledge, Virtual Private Snowflake is used by a minority of the customers, a small minority of the customers. There are other more popular ways within Snowflake, like private link, for example, I would say, to enhance your security and your account segregation. But I wouldn't say that simply because the platform is multi-tenant, it is vulnerable. Of course, in many cases, your security or compliance requirements requires you to eliminate even this risk, but I wouldn't say that there are a lot of other platforms in different areas that are multi-tenant and-- And probably better than your on-prem, your average on-prem installation. Probably, probably. Okay, so I buy that. I would say on that, that maybe a shared environment is a higher value target for hackers. So if you're on a shared environment with thousands of other customers, if I'm a hacker, I would go there, 'cause then I get data for thousands of customers instead of try to focus on just one target and getting data for just one company. I think that's the most significant advantage. And obviously, Snowflake are investing a lot in making all of their environments very, very secure, and from our interactions with large Snowflake customers, we know that Snowflake are going above and beyond in making sure these environments are secure. Yeah, that's good, that's good news, because if I don't have to spend up, I can put the budget elsewhere. How do you guys think Snowflake's recent moves... They're making a couple of big moves. They've recently added unstructured data. They used to have semi-structured data. They're going after the data science and data lake functionality. Do those kinds of moves, I guess they're two different things, but does that change the way that security pros should think about protecting their Snowflake environment? I would say that Snowflake is moving fast with adding new functionality, well fast, but not too fast. They're releasing it in a controlled way. I would say that for new capabilities, of course, in some cases there are new attack vectors or new risks and obviously, securing different types of data may bring new challenges, but the basics, I think, remains the same. The basics of the network, identity authentication, authorization and auditing monitoring. I would say they will be the same and perhaps new features or capability will need to be used. And the largest issue, as data democratization is growing within organizations, and more and more people are using your data cloud, that also needs to be addressed. All right, finally, I want to end, I want to talk a little bit about futures. Have you guys talked in your book about multi-cloud as a way to reduce your reliance on a single vendor? And of course, it happens through M and A, and that's cool. We've talked a lot about multi-cloud, and we've been using this term that we coined, called supercloud, and it references an abstraction layer that exists on top of, and floats across, if you will, multiple clouds, and it hides some of that underlying complexity, and we feel like Snowflake is a good example of a company that's moving in that direction, building value on top of all that hyperscale infrastructure. So I wonder how you see Snowflake's moves in that direction would impact the way you think about DataSecOps. So definitely, we also see the trend of companies adopting more and more types of cloud and cloud technologies. They're in one cloud today. They want to move to a second one, almost every company that I talk to have, nowadays, a multi-cloud strategy. With respect to Snowflake, they basically have it figured out, because they are an overlay, like a supercloud, super data cloud, that is spread across any cloud, and you can basically pick and choose where you want to put your data for what use cases, and that's really, really helpful, because then you don't have to manage the complexity of multiple solutions for multiple areas of the business. We see this also in other areas where companies are saying, "Hey, I prefer to not use a specific cloud technology "for that purpose, but use a vendor that can cover my needs "across the clouds," definitely on the security side, where they want one throat to choke, so to speak, but they want to control things on a central place. As Ben mentioned before, complexity is the enemy of security and having those multi-cloud operations, from a security perspective, definitely adds complexity, which adds risks, so simplifying that is really, really helpful. Hey, thank you for that, and thank you guys for coming on today. Why don't you give us a little bumper sticker on Satori. What do you guys do? Give us the quick commercial. So, we help companies secure access to their data on platforms like Snowflake and others. We build really innovative technology that decouples security controls from the actual data layer. So if you think about it, where you can put controls to govern how people access data. You can put it inside the database. You can put it somewhere on the client. We've actually invented a technology that can do that in the middle, so you don't have to coalesce and mix your security concerns with your data. You don't have to go to your clients' users' end-points, laptops and put technology there. We set technology that fits in the middle, that decouples that aspect of your DataSecOps operations, and really helps companies implement those security controls much faster, because it's detached from the rest of their operation. Nice thought, leaning into that simplicity trend that you talked about. Okay guys, that's all the time we have today. Really, I want to thank Ben and Yoav for coming on "theCUBE." It was really great to have you. I'd love to welcome you back at some point. Thank you, Dave. >> Thank you, it was a pleasure All right, remember these episodes, these episodes are all available as podcasts, wherever you listen. All you got to do is search breaking analysis podcasts. Check out ETR's website at ETI.ai. We also publish full report every week on Wikibon.com and SiliconAngle.com. You can get in touch with me. Email me, David.Vellante@SiliconANGLE.com @DVellante or comment on our LinkedIn posts. This is Dave Vellante for "theCUBE Insights," powered by ETR. Have a great week, stay safe, be well, and we'll see you next time. (bright music)