SiliconANGLE theCUBESiliconANGLE theCUBE
  • info
  • Transcript
Clip #8 - Is data an asset?
Clip Duration 01:34 / October 26, 2021
Breaking Analysis: Data Mesh...A New Paradigm for Data Management
Video Duration: 42:05
search

From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from theCube in ETR. This is "Braking Analysis with Dave Vellante." Data mesh is a new way of thinking about how to use data to create organizational value. Leading edge practitioners are beginning to implement data mesh in earnest and, importantly. data mesh is not a single tool or a rigid reference architecture, if you will. Rather, it's an architectural and organizational model that's really designed to address the shortcomings of decades of data challenges and failures, many of which we've talked about on theCube. As important, by the way, it's a new way to think about how to leverage data at scale across an organization and across ecosystems. Data mesh, in our view, will become the defining paradigm for the next generation of data excellence. Hello and welcome to this week's Wikibon Cube insights, powered by ETR. In this "Breaking Analysis," we welcome the founder and creator of data mesh, author/thought leader/technologist Zhamak Deghani. Zhamak, thank you for joining us today. Good to see you. Hi, Dave! It's great to be here. All right, real quick, let's talk about what we're going to cover. I'll introduce or reintroduce you to Zhamak. She joined us earlier this year in our "Cube on Cloud" program. She's the director of emerging tech at ThoughtWorks North America, and a thought leader, a practitioner, a software engineer, an architect, and a passionate advocate for decentralized technology solutions and data architectures. And, Zhamak, since we last had you on as a guest, which was less than a year ago, I think you've written two books in your spare time. One on data mesh and another called "Software Architecture: The Hard Parts," both published by O'Reilly. So, how are you? You've been busy. I've been busy. Yes, I'm good. It's been a great year. It's been a busy year. I'm looking forward to the end of the year and the end of these two books. But, it's great to be back and speaking with you. Well, and you've got to be pleased with the momentum that data mesh has and let's just jump back to the agenda for a bit and get that out of the way. We're going to set the stage by sharing some ETR data, our data partner, on the spending profile in some of the key data sectors, and then we're going to review the four key principles of data mesh, just, it's always worthwhile to sort of set that framework. We'll talk a little bit about some of the dependencies and the data flows and we're really going to dig today into principle number three and a bit around the self-service data platforms. And, to that end, we're going to talk about some of the learnings that Zhamak has captured since she embarked on the data mesh journey with her colleagues and her clients and we specifically want to talk about some of the successful models for building the data mesh experience and then we're going to hit on some practical advice and we'll wrapped with some thought exercises, maybe a little tongue in cheek, some of the community questions that we get.

So, the first thing I want to do, we'll just get this out of the way, is introduce the spending climate. We'll use this X Y chart to do this. We do this all the time. It shows the spending profiles and the ETR dataset for some of the more data related sectors of the ETR taxonomy. They dropped their October data last Friday. So, I'm using the July survey here. We'll get into the October survey in future weeks. But, it's about 1500 respondents. I don't see a dramatic change coming in the October survey. But, the Y axis is net score or spending momentum. The horizontal axis is market share or presence in the dataset, and that red line, that 40%, anything over that we consider elevated. So, for the past eight quarters or so, we've seen machine learning/AI, RPA, containers, and cloud as the four areas where CIOs and technology buyers have shown the highest net scores.

And, as we've said, what's so impressive for cloud is it's both pervasive and it shows high velocity from a spending standpoint and we plotted the three other data related areas: database EDW, analytics BI, and big data and storage. The first two, while under the red line, are still elevated. The storage market continues to kind of plod along and we've plotted the outsourced IT just to balance it out for context. That's an area that's not so hot right now. So, just want to point out that these areas, AI, automation containers, and cloud, they're all relevant to data and they're fundamental building blocks of data architectures, as are the two that are directly related to data: database and analytics, and, of course, storage. So, it just gives you a picture of the spending sector. So, I wanted to share this slide, Zhamak, that we presented and that you presented in your webinar.

I love this. It's a taxonomy put together by Matt Turek, who's a VC, and he called this the MAD landscape: machine learning and AI and data. And, Zhamak, the key point here is there's no lack of tooling. You've made the data mesh concept sort of tools agnostic. It's not like we need more tools to succeed in data mesh, right? Absolutely agree. I think we have plenty of tools. I think what's missing is a meta-architecture that defines the landscape in a way that it's in step with organizational growth and then defines that meta-architecture in a way that these tools can actually inter-operate and integrate really well. The clients right now have a lot of challenges in terms of picking the right tool, regardless of the technology they go down. Either they have to go in and big bite into a big data solution and then try to fit the other integrated solutions around it. Or, as you see, go to that menu of large lists of applications and spend a lot of time trying to integrate and stitch these two things together. So, I'm hoping that data mesh creates that kind of meta-architecture for tools to inter-operate and plug in and I think our conversation today around self-serve data platform hopefully illuminate that. Yeah, we'll definitely circle back because that's of the questions we get all the time from the community. Okay, let's review the four main principles of data mesh for those who might not be familiar with it and those who are, it's worth reviewing. Zhamak, allow me to introduce them and then we can discuss a bit. So, a big frustration I hear constantly from practitioners is that the data teams don't have domain context. The data team is separated from the lines of business and, as a result, they have to constantly context switch and, as such, there's a lack of alignment. So, principle number one is focused on putting end-to-end data ownership in the hands of the domain or, what I would call, the business lines. The second principle is data as a product, which does cause people's brains to hurt sometimes. But, it's a key component and, if you start sort of thinking about it and talking to people who have done it, it actually makes a lot of sense.

And this leads to principle number three, which is a self-serve data infrastructure, which we're going to drill into quite a bit today. And then, the question we always get when we introduce data mesh is: how to enforce governance in a federated model? So, let me bring up a more detailed slide, Zhamak, with the dependencies and ask you to comment, please. Sure, as you said, really, the root cause we're trying to address is the siloing of the data external to where the action happens, where the data gets produced, where the data needs to be shared, where the data gets used, right? In the context of the business. So, it's about really the root cause of the centralization gets addressed by distribution of the accountability into it, back to the domains and these domains, this distribution of accountability, technical accountability, to the domains have already happened. In the last decade or so, we saw the transition from one general IT addressing all of the needs of the organization to technology groups within the IT, or even outside of the IT, aligning themselves to build applications and services that the different business units need.

So, what data mesh does, it just extends that model and say, "Okay, we're aligning business with the tech and data now, right?" So, both application of the data in ML or insight generation in the domains related to the domains needs, as well as sharing the data that the domains are generating with the rest of the organization. But, the moment you do that, then you have to solve other problems that may arise and that gives birth to the second principle, which is about data as a product, as a way of preventing data siloing happening within the domain. So, changing the focus of the domains that are now producing data from, I'm just going to create that data collect for myself and that satisfied my needs, to, in fact, the responsibility of domain is to share the data as a product with all of the wonderful characteristics that a product has and I think that leads to really interesting architectural and technical implications of what actually constitutes data as a product and we can have a separate conversation.

But, once you do that, then that's the point in the conversation that CIO says, "Well, how do I even manage the cost of operation if I decentralize building and sharing data to my technical teams, to my application teams? Do I need to go and hire another hundred data engineers?" And I think that's the role of a self-serve data platform in the way that it enables and empowers generalist technologies that we already have in the technical domains, that the majority population of developers these days, right? So, the data platform attempts to mobilize the generalist technologies to become data producers, to become data consumers, and really rethink what tools these people need. And the last principle, so data platform is really to give an autonomy to domain teams and empowering them and reducing the cost of ownership of the data products.

And, finally, as you mentioned, the question around, "How do I still assure that these different data products are interoperable, are secure, respecting privacy, now in a decentralized fashion, right?" When we are respecting the sovereignty or the domain ownership of each domain, and that's leads to this idea of, both from operational model, applying some sort of a federation where the domain owners are accountable for interoperability of their data product. They have incentives that are aligned with global harmony of the data mesh as well as from the technology perspective, thinking about this data as a product with a new lens, with a lens that all of those policies that need to be respected by these data products, such as privacy, such as confidentiality, can we encode these policies as computational, executable units and then code them in everyday products so that we get automation, we get governance through automation? So, that's the relationship, the complex relationship, between the four principles. Yeah, thank you for that. I mean, it's just a couple of points. There's so many important points in there. But, the idea of the silos and the data as a product sort of breaking down those silos, because if you have a product and you want to sell more of it, you make it discoverable and, as a P and L manager, you put it out there. You want to share it, as opposed to hide it. And then, this idea of managing the costs, number three, where people say, "Well, centralize, and you can be more efficient." But, that essentially was the failure. And, your other point, related point, is generalist versus specialist. That's kind one of the failures of Hadoop was you had these hyper-specialist roles emerge and so you couldn't scale and so let's talk about the goals of data mesh for a moment. You've said that the objective is to exchange, you call it a new unit of value, between data producers and data consumers and that unit of value is a data product and you've stated that a goal is to lower the cognitive load on our brains, I love this, and simplify the way in which data are presented to both producers and consumers and doing so in a self-serve manner that eliminates the tapping on the shoulders or emails or raising tickets. So, I'm trying to understand how data should be used, et cetera. So, please explain why this is so important and how you've seen organizations reduce the friction across the data flows and the interconnectedness of things like data products across the company. Yeah, I mean, this is important. As you mentioned, initially, when this whole idea of a data-driven innovation came to exist and we needed all sorts of technology stacks, we centralized creation of the data and usage of the data, and that's okay when you first get started where the expertise and knowledge is not yet diffused and it's only the privilege of a very few people in the organization. But, as we moved to a data-driven innovation cycle in the organization, and as we learn how data can unlock new programs, new models of experience, new products, then it's really, really important, as you mentioned, to get the consumers and producers talk to each other directly without a broker in the middle.

Because, even though that having that centralized broker could be a cost-effective model, but if we include the cost of missed opportunity for something that we could have innovated but we missed that opportunity because of months of looking for the right data, then the cost benefit parameters and formula changes. So, to have that innovation really embedded, data-driven innovation, embedded into every domain, every team, we need to enable a model where the producer can directly, peer-to-peer, discover the data, use it, understand it, and use it. So, the litmus test for that would be going from a hypothesis that, as a data scientist, I think there is a pattern, and there is an insight in the customer behavior that, if I have access to all of the different information about the customer, all of the different touch points, I might be able to discover that pattern and personalize experience of my customer. The litmus test is going from that hypothesis to finding all of the different sources, be able to understand, and then be able to connect them and then turn them into training of machine learning and the rest is, I guess, known as an intelligent product. Got it. Thank you. So, a lot of what we do here in "Breaking Analysis" is we try to curate and then point people to new resources. So, we will have some additional resources because this is not superficial, what you and your colleagues and the community are creating. But, so I do want to curate some of the other material that you had. So, if I bring up this next chart, the left-hand side is a curated description, both sides, of your observations of most of the monolithic data platforms. They're optimized for control. They serve a centralized team that has hyper specialized roles, as we talked about. The operational stacks are running enterprise software. They're on Kubernetes and the microservices are isolated from, let's say, the spark clusters, which are managing the analytical data, et cetera. Whereas the data mesh proposes much greater autonomy and the management of code and data pipelines and policy as independent entities versus a single unit and you've made this the point that we have to enable generalists, and you can borrow from so many other examples in in the industry. So, it's an architecture based on decentralized thinking that can really be applied to any domain, really domain agnostic in a way. Yeah, so, I think if I pick a one key point from that diagram is really, or that comparison, is the data platform. So, the platform capabilities need to present a continuous experience from an application developer building an application that generates some data. Let's say I have an e-commerce application that generates some data to the data product that now presents and shares that data as temporal immutable facts that can be used for analytics to the data scientists that uses that data to personalize the experience to the deployment of that ML model now back to that e-commerce application. So, if you really look at this continuous journey, the walls between these separate platforms that we have built needs to come down.

The platforms underneath that support the operational systems versus support the data platforms versus supporting the ML models, they need to kind of play really nicely together because, as a user, I'll probably fall off the cliff every time I go through these stages of this value stream. So then, the interoperability of our data solutions and operational solutions need to increase drastically because, so far, we've got away with running operational systems and application on one end of the organization, running data analytics in another, and build a spaghetti pipeline to connect them together. Neither of the ends are happy. I hear from data scientists, data analysts, pointing fingers at the application developer saying, "You're not developing your database the right way." And application pointing fingers say, "My database is for running my application. It wasn't designed for sharing analytical data." So, we've got to really, what data mesh as a mesh tries to do is bring these two worlds together closer because, and then the platform itself has to come closer and turn into a continuous set of services and capabilities, as opposed to these disjointed, big, isolated stacks. Very powerful observations there. So, we want to dig a little bit deeper into the platform, Zhamak, and have you explain your thinking here because everybody always goes to the platform. "What do I do with the infrastructure?" So, you've stressed the importance of interfaces, the entries to and the exits from, the platform. You've said you use a particular parlance to describe it and, in this chart, it kind of shows what you call the planes, not layers, the planes of the platform. It's complicated with a lot of connection points. So, please explain these planes and how they fit together. Sure, I mean, there was a really good point that you started with that, when we think about capabilities that enables build of our application, builds upon our data products, build better analytical solutions, usually we jump too quickly to the deep end of the actual implementation of these technologies, right? "Do I need to go buy it as a catalog? Or, do I need some sort of a warehouse to store it?" And, what I'm trying to kind of elevate us up and out is to force us to think about interfaces on APIs, the experiences that the platform needs to provide to run these secure, safe, trustworthy, performing mesh of data products and, if you focus on, then, the interfaces, the implementation underneath can swap out, right?

So, you can solve one for the other over time. So, that's the purpose of having those lollipops and focusing and emphasizing, what is the interface that provides a certain capability like the storage, like the data product life cycle management and so on. The purpose of the planes, the mesh experience plane, data product experience and utility plane, is really giving us a language to classify different set of interfaces and capabilities that play nicely together to provide that cohesive journey of a data product developer data consumer. So, then the three planes are really around, okay, at the bottom layer, we have a lot of utilities. We have that Matt Turek's kind of MAD data tooling chart. So, we have a lot of utilities right now. They manage workflow management.

They do data processing. You've got your spark link. You've got your storage. You've got your lake storage. You've got your time series of storage. We've got a lot of tooling at that level. But, the layer that we kind of need to imagine and build today we don't buy yet, as long as I know, is this layer that allows us to exchange that unit of value, right? To build and manage these data products. So, the language and the APIs and interface of this product experience plane is not, oh, I need this storage or I need that workflow processing. It's that I have a data product. It needs to deliver certain types of data. So, I need to be able to model my data.

It needs to, as part of this data product, I need to write some processing code that keeps this data constantly alive because it's receiving upstream, let's say, user interactions with a website and generating the profile of my users. So, I need to be able to write that. I need to serve the data. I need to keep the data alive and I need to provide a set of SLOs and guarantees for my data, so that good documentation, so that someone who comes to the data product knows, what's the cadence of refresh, what's the retention of the data, and a lot of other SLOs that I need to provide. And, finally, I need to be able to enforce and guarantee certain policies in terms of access control, privacy encryption, and so on. So, as a data product developer, I just work with this unit, a complete autonomous self-contained unit, and the platform should give me ways of provisioning this unit and testing this unit and so on.

That's kind of why I emphasize on the experience, and, of course, we're not dealing with one or two data product. We're dealing with a mesh of data products. So, at the kind of mesh level experience, we need a set of capabilities and interfaces to be able to search the mesh for the right data, to be able to explore the knowledge graph that emerges from this interconnection of data products, and we need to be able to observe the mesh for any anomalies. Did we create one of these giant master data products that all the data goes into and all the data comes out of? Have we found ourselves a bottleneck? So, be able to do those mesh level capabilities, we need to have a certain level of APIs and interfaces. And, once we decide what constitutes that to satisfy this mesh experience, then we can step back and say, "Okay, now what sort of a tool do I need to build or buy to satisfy them?" And that's not what the data community or data part of our organizations are used to. I think, traditionally, we're very comfortable with buying a tool and then changing the way we work to serve the tool and this is slightly inverse to that model that we might be comfortable with. Right, and pragmatists will tell you, people who've implemented data mesh, they'll tell you they spend a lot of time on figuring out data as a product and the definitions there, the organizational, the getting domain experts to actually own the data end-to-end. And they will tell you, "Look, the technology will come and go." And so, to your point, if you have those lollipops and those interfaces, you'll be able to evolve because we know one thing's for sure in this business, technology is going to change. So, you had some practical advice and I wanted to discuss that for those that are thinking about data mesh. I scraped this slide from your presentation that you made. And, by the way, we'll put links in there. Your colleague, Emily, who I believe is a data scientist, had some really great points there as well that practitioners should dig into. But, you made a couple of points that I'd like you to summarize and, to me, the big takeaway was it's not a one and done. This is not a 60 day project. It's a journey. And, I know that's kind of cliche, but it's so very true here. Yes, this was a few starting points for people who are embarking on building or buying the platform that enables the mesh creation. So, it was a bit of a focus on kind of the platform angle and I think the first one is what we just discussed. Instead of thinking about mechanisms that you're building, thinking about the experiences that you are enabling. Identify who are the people. What is the persona of data scientists? I mean, data scientists has a wide range of personas, or data product developer in same. What is the persona I need to develop today or enable and empower today? What skillsets do they have? And so, thinking about experience mechanisms, I think we are at this really magical point. I mean, how many times in our lifetime, we come across a complete blank, kind of white space to a degree to innovate.

So, let's take that opportunity and use a bit of creativity while being pragmatic. Of course, we need solutions today or yesterday. But, to still thinking about the experience as not mechanisms that we need to buy. So, that was kind of the first step and the nice thing about that is that there is an iterative path to maturity of your data mesh. I mean, if you started with thinking about, okay, which are the initial use cases I need to enable? What are the data products that those use cases depend on that we need to unlock? And what is the persona of my, or general skillset of my data product developer? What are the interfaces I need to enable? You can start with the simplest possible platform for your first two use cases and then think about, okay, the next set of data developers, they have a different set of needs.

Maybe today, I just enable the SQL-like querying of the data. Tomorrow, I enable the data scientist file-based access of the data the day after I enable the streaming aspect. So, you have this evolutionary kind of path ahead of you, and don't think that you have to start with building out everything. I mean, one of the things we've done is taking this harvesting approach that you work collaboratively with those technical, cross functional domains that are building the data products and see how they are using those utilities and harvesting what they are building as the solutions for themselves back into the backend of the platform. But, at the end of the day, we have to think about mobilization of the largest population of technologies we have. We have to think about diffusing the technology and making it available and accessible by the generalist technologists that, and we've come a long way.

We've gone through these sort of paradigm shifts in terms of mobile development, in terms of functional programming, in terms of cloud operation. It's not that we are struggling with learning something new, but we have to learn something that works nicely with the rest of the tooling that we have in our toolbox right now. So, again, put that generalist as one of your center personas, not the only person of course, we'll have specialists. Of course, we'll always have data scientists specialists, but any problem that can be solved as a general kind of engineering problem, and I think there's a lot of aspects of data mesh that can be just a simple engineering problem. Let's approach it that way and then create the tooling to empower those generalists. Great. Thank you. So, listen, I've been around a long time. And so, as an analyst, I've seen many waves and we often say language matters. And, so, I mean, I've seen it with the mainframe language. It was different than the PC language. It was different than internet, different than cloud, different than big data, et cetera, et cetera. And so, we have to evolve our language. And so, I'm just going to throw a couple of things out here. I often say data is not the new oil because data doesn't live by the laws of scarcity. We're not running out of data, but I get the analogy. It's powerful. It powered the industrial economy. But, it's bigger than that. What do you feel? What do you think when you hear that data's the new oil? Yeah, I don't respond to those data is the gold or oil or whatever scarce resource, because, as you said, it evokes a very different emotion. It doesn't evoke the emotion of, I want to use this. I want to utilize it. It feels like I need to kind of hide it and collect it and keep it to myself and not share it with anyone." It doesn't evoke that emotion of sharing. I really do think that data and, with a little asterisk, and I think the definition of data changes, and that's why I keep using the language of data products or data quantum. Data becomes the most important, essential element of existence of computation. What do I mean by that? I mean that a lot of applications that we have written so far are based on logic, imperative logic.

If this happens, do that and else do the other, and we're moving to a world where those applications generating data that we then look at and the data that's generated becomes the source, the patterns that we can explore it, to build our applications, as in, curate the weekly playlist for Dave, every Monday, based on what he has listened to and other people has listened to based on his profile. So, we're moving to the world that is not so much about applications using the data necessarily to run their businesses. That data is really, truly is the foundational building block for the applications of the future. And then I think in that we need to rethink the definition of the data, and maybe that's for a different conversation, but, I really think we have to converge the processing and the data together, the substance and the processing together, to have a unit that is a composable, reusable, trustworthy. And that's the idea behind the kind of data product as an atomic unit of what we build from future solutions. Got it, and now, something else that I heard you say or read that really struck me, because it's another sort of often stated phrase, which is data is our most valuable asset and you pushed back a little bit on that when you hear people call data an asset. People said often have said they think data should be, or will eventually be, listed as an asset on the balance sheet and, in hearing what you said, I thought about that and said, "Well, maybe data as a product, that's an income statement thing. That's generating revenue or it's cutting costs." It's not necessarily, 'cause I don't share my assets with people. I don't make them discoverable. Add some color to this discussion. I think it's actually interesting you mentioned that because I read the new policy in China that CFOs actually have a line item around the data that they capture. We don't have to go to the political conversation around authoritarian of collecting data and the power that that creates and the society that leads to, but that aside, that big conversation, little conversation, aside, I think you're right. I mean, the data as an asset generates a different behavior. It creates different performance metrics that we would measure. I mean, before conversation around data mesh came to kind of exist, we were measuring the success of our data teams by the terabytes of data they were collecting, by the thousands of tables that they had stamped as golden data.

None of that leads to necessarily, there's no direct line I can see between that and actually the value that data generated. But, if we invert that, so that's why I think it's a rather harmful because it leads to the wrong measures and metrics to measure for success. So, if you're invert that to a bit of product thinking or something that you share to delight the experience of users, your measures are very different. Your measures are the happiness of the user, the decreased lead time for them to actually use and get value out of it, the growth of the population of the users. So, it works in a very different kind of behavior and success metrics. I do say, if I may, that I probably come back and regret the choice of words around product one day because of the monetization aspect of it, but maybe there's a better word to use, but, that's the best I think we can use at this point in time. Why do you say that, Zhamak, because it's too directly related to monetization, that has a negative connotation, or it might not apply in things like healthcare? Yeah, I think because if we want to take a shortcut, and I remember this conversation years back, that people think that the reason to kind of collect data or have data so that we can sell it. It's just the monetization of the data and then we have this idea of the data marketplaces and so on and I think that is actually the least valuable, outcome that we can get from thinking about data as a product, that direct sale and exchange of data as a monetary exchange of value. So, I think that might redirect our attention to something that really matters, which is enabling using data for generating ultimately value for people, for the customers, for the organizations, for the partners, as opposed to thinking about it as a unit of exchange for money. I love data as a product. I think your instinct was right on and I'm glad you brought that up because I think people misunderstood, in the last decade, data as selling data directly. But, really, what you're talking about is using data as a ingredient to actually build a product that has value, and value is either generate revenue, cut costs, or help with a mission. It could be saving lives. But, in some way, for a commercial company, it's about the bottom line and that's just the way it is. So, I love data as a product. I think that's going to stick. So, one of the other things that struck me in one of your webinars was one of the Q&A, one of the questions was, "Can I finally get rid of my data warehouse?" So, I want to talk about the data warehouse, the data lake, JPMC used that term, the data lake, which some people don't like.

I know John Furrier, my business partner, doesn't like that term, but the data hub. And one of the things I've learned from sort of observing your work is that, whether it's a data lake, a data warehouse, data hub, data whatever, it should be a discoverable node on the mesh. It really doesn't matter the technology. What are your thoughts on that? Yeah, I think that really shift is from a centralized data warehouse to data warehouse where it fits. So, I think if we just crossed that centralized piece, we're all in agreement that data warehousing provides interesting capabilities that are still required, perhaps as a edge node of the mesh, that is optimizing for certain queries, let's say financial reporting, and we still want to direct a fair bit of data into a node that is just for those financial reportings and it requires the precision and the speed of operation that the warehouse technology provides. So, I think definitely that technology has a place.

Where it falls apart is when you want to have a warehouse to rule all of your data and canonically model your data because you have to put so much energy into kind of try to harness this model and create this very complex and fragile snowflake schemas or so on that that's all you do. You spend energy against the entropy of your organization to try to get your arms around this model and the model is constantly out of step with what's happening in reality, because the reality of the business is moving faster than our ability to model everything into one canonical representation. I think that's the one we need to challenge, not necessarily application of data warehouse on a node. I want to close by coming back to the issues of standards. You've specifically envisioned data mesh to be technology agnostic, as I said before, and, of course everyone, myself included, we're going to run a vendor's technology platform through a data mesh filter. The reality is, per the Matt Turek chart we showed earlier, there are lots of technologies that can be nodes within the data mesh or facilitate data sharing or governance, et cetera. But, there's clearly a lack of standardization. I'm sometimes skeptical that the vendor community will drive this, but maybe Kubernetes, Google, or some other internet giant is going to contribute something to open source that addresses this problem. But, talk a little bit more about your thoughts on standardization. What kinds of standards are needed and where do you think they'll come from? Sure, I mean, you're right that the vendors are not today incentivized to create those open standards because majority of the vendors, not all of them, but some vendor's operational model is about bringing your data to my platform and then bring your computation to me and all will be great and that will be great for a portion of the clients, a portion of environments where that complexity we're talking about doesn't exist. So, we need, yes, other players, perhaps, maybe some of the cloud providers or people that are more incentivized to open their platform, in a way, for data sharing. So, as a starting point, I think standardization around data sharing. So, if you look at the spectrum right now, we have a defacto standards.

It's not even a standard for something like SQL. I mean, everybody's bastardized SQL and extended it with so many things that I don't even know what the standard SQL is anymore, but we have that for some form of a querying. But, beyond that, I know for example, folks at Databricks to start to create some standards around their data sharing and sharing the data in different models. So, I think data sharing as a concept, the same way that APIs were about capability sharing. So, we need to have the data APIs, or analytical data APIs, and data sharing extended to go beyond simply SQL or languages like that. I think we need standards around computational policies. So, this is, again, something that is formulating in the operational world.

We have a few standards around, how do you articulate access control? How do you identify the agents who are trying to access with different authentication? The kinds of, we need to bring some of those or add our own data-specific articulation of policies. Something as simple as identity management across different technologies is non-existent. So, if you want to secure your data across three different technologies, there is no common way of saying who's the agent that is acting to access the data. Can I ask that to kids and authorize them? So, so those are some of the very basic building blocks. And then, the gravy on top would be new standards around enriched kind of semantic modeling of the data. So, we have a common language to describe the semantic of the data in different nodes, and then relationship between them.

We have prior work with RDF and folks that we're focused on, I guess, linking data across the web with the kind of the data web, I guess, work that we had in the past. We need to revisit those and see their practicality in an enterprise context. So, data modeling, rich language for data semantic modeling and data connectivity. Most importantly, I think those are some of the items on my wish list. >> That's good. Well, we'll do our part to try to keep the standards, push that movement. Zhamak, we're going to leave it there. I was so grateful to have you come on to theCube, really appreciate your time. It's just always a pleasure. You're such a clear thinker. So, thanks again. >> Thank you, Dave. It's wonderful to be here. Now, we're going to post a number of links to some of the great work that Zhamak and her team and her books and so you can check that out because, remember, we publish each week on siliconangle.com and wikibon.com and these episodes are all available as podcasts wherever you listen. Just search "Breaking Analysis Podcast." Don't forget to check out etr.plus for all the survey data. Do keep in touch. I'm @Dvellante. Follow Zhamak, D-Z-H-A-M-A-K-D. Or you can email me at David.vellante@siliconangle.com. Comment on the LinkedIn post. This is Dave Vellante for theCube insights, powered by ETR. Be well and we'll see you next time. (gentle music)