- info
- Transcript

Savannah PetersonGood morning, high performance computing fans, and welcome back to Atlanta, Georgia. We're here at SuperComputing 2024, midway through day one of three days of coverage here on theCUBE. My name is Savannah Peterson. Delighted to be joined by this power packed, super intellectual panel for this next session. Feeling very lucky. We've got NVIDIA, Supermicro, and WEKA on the stage. Nilesh, I'm going to start with you.
Nilesh PatelSure.
Savannah PetersonBecause just a few minutes ago, as we were talking about, you have a very exciting announcement.
Nilesh PatelCorrect. And thanks for having us.
Savannah PetersonMy pleasure.
Nilesh PatelReally excited about the announcement. This is industry's first Grace-based storage solutions. Now, WEKA has been delivering high performance value across multiple different workloads. And particularly in the AI infrastructure space, we have been delivering tremendous value for the training workload, where we have reduced the amount of epoch time by 10 X, 20 X and so on. As we continue to see the build out, the two challenges are happening. One is the power consumption and the power requirement in data center is growing like crazy. And the second thing is now we are getting into influencing space where it's becoming a token economy. So the cost token for dollars, tokens per wattage use and so on becomes our important KPIs. So we got together with the NVIDIA and Supermicro and tried to attack one of the core problems that is becoming a cliff hill for the data center growth, particularly for AI infrastructure. So we are really excited.
From start to finish, in a few months we were able to identify the problem and have a solution. And so between four of us, we are really looking forward to really changing the trajectory of some of the high performance computing, particularly with the power consumption in the data center. And as we talk through it, we'll highlight some of the value proposition we have, both on the platform perspective as well as how we can address some of the pain point in the system.
Savannah PetersonSo many things to unpack there. And very exciting. Congratulations. That's a big deal.
Nilesh PatelThank you.
Savannah PetersonAre you as excited about this as Nilesh is?
Ian FinderOh, absolutely. I think the industry has had an appetite for high performance ARM CPUs for a long time. They've obviously had an appetite for high performance storage fabrics. When you build an at-scale AI system or an at-scale HPC system, the most important thing is to keep your compute fed and happy. And being able to take the same types of processors we're using now in some of our fastest supercomputers and actually put that into a storage appliance to get two X kind of per watt impact from a high performance ARM chip, but now pushing through from HPC into actual storage devices, I mean that's tremendous. This is the very same chip that powers some of the fastest supercomputers in the world and it's now at the heart of the WEKA storage solution. That's incredible.
Savannah PetersonThis is going to help customers achieve scale. This is going to help a lot more people do a lot more faster and better.
Nilesh PatelCorrect. So I think from overall, if you think about it, the performance profile that is required required a lot more balanced performance. One of the challenges around inferencing and the model, different multi-modal environments, you're going to see tremendous pressure on amount of throughput and memory bandwidth required to move the models back and forth. Particularly when you have multiple users running different prompts, now you have embedding sizes that are getting ... Like reasoning models and others that are creating tremendous datasets that needs to be moved back and forth between the GPU memory and the storage. You cannot have enough GPU memory. So in terms of the storage platform, having both the high-performance computing from the CPU's perspective, but also the memory bandwidth that designed into the Grace platform is really helps a lot. And working with Supermicro, we were able to architect the platform, which has a right level of network I/O as well. But now you have a really balanced platform that is really suiting the need for what we see coming up in terms of the multi-model, multi-modal inferencing environment, which is going to push the data explosion to the next level. And so we are excited about how it has come together as well.
Ian FinderThat's really the key to building effective architectures. Everyone at this conference, they're obsessed with the idea of balance from a system design and platform design level. And it's the balance that causes you to need high-throughput storage to saturate your compute environments. Once you look into the box with Grace, we've architected a chip that has a tremendous amount of memory bandwidth. We have 512 gigabytes a second of memory bandwidth per socket in Grace so that the WEKA machine has a terabyte a second of memory bandwidth in aggregate in that storage appliance.
Nilesh PatelCorrect.
Ian FinderThat's impressive. That's more than pretty much any DDR-based compute server today. And when you apply that high-speed memory bandwidth support, which by the way, we can do that in about 32 watts of memory power. So this is very, very power-efficient, very, very high bandwidth memory access. Build a storage backplane onto that, which is where Supermicro came in. We now have a machine that can run full-out very deterministic, very high-performance ... And it works in a way that you would have a lot more trouble if you tried to do this on conventional x86 architectures. So you have less memory bandwidth, you have a lot of bottlenecks between chiplets. And Grace being a very flat, very deterministic architecture lends itself really well to this. So it's great to see all the way up the stack, but down into the storage.
Nilesh PatelThat's correct.
Savannah PetersonWhat I'm hearing there too is with that balance also comes sustainability.
Nilesh PatelCorrect.
Savannah PetersonAnd you have much more reliable systems that aren't going to overheat, cost too much.
Ian FinderSustainability is all about doing the most for your most resource-intensive portions of your compute. Today that's actually the CPU compute and the GPU compute. On one hand, sustainability and data center capacity go hand-in-hand. If I can deliver you a storage appliance that uses half the power, you can fit twice as much throughput in the same data center power on the level. When you scale that up alongside the rest of the compute resources, the most impactful thing you can do at a data center level is to remove bottlenecks. And that's exactly what we've done with this design.
Nilesh PatelYeah. And I think we already proven that a petabyte of data stored on WEKA reduces carbon footprint by 260 tons.
Savannah PetersonWhoa.
Nilesh PatelSo that kind of improvement in sustainability is going to even get further amplified by having a storage subsystem now that consumes less power, right? And then this further fuels to the KPI that is emerging, fast emerging in the enterprise AI space is tokens per dollars per watt. So now you have a solution, as we said, very balanced architecture that is not only performing to the needs for how the next generation models are going to drive the demand on the data, but also not consume as much power. Today we are already offering, compared to an alternate solution, almost four to 10 X better power density. And so you can pack a lot more data and performance in as short as possible time. And I think Supermicro did a great job of packing it, everything we talked about, in 1U. I think it would be great to-
Savannah PetersonYeah. Tell me how Supermicro makes the magic happen.
Patrick ChiuYeah, definitely. I'm so happy that we can work with our best partner, NVIDIA and WEKA, to build this new system. And this is the first NVIDIA Grace storage we are launching, the <inaudible>. For the AI and HPC we leverage the latest technology, the EDSF, E3. We have 16 of them in the IU systems. And we can achieve almost one petabyte right now in the 1U systems. And with the PCIe-
Savannah PetersonI'm like getting nerd goosebumps back here. That is so impressive.
Patrick ChiuExactly. And with the gen five performance and also leverage the latest, the greatest, the Grace architectures, as Ian just mentioned about the performance, the DRAM onboard memory bandwidth, and you will have no bottleneck from the SSD to the CPU to the memory, to the networking. We are under plus the WEKA software stack, right? You can convert this hardware advantage to the whole rack and whole data centers. So we are so excited that we can be the partner and launch these new systems. And we believe there will be revolution for the new AI data centers and HPC data centers.
Savannah PetersonI can tell you're excited.
Ian FinderOne of the interesting things for those of us in this industry a long time, when you look at something like EDA, there's a much different resource usage pattern than doing something like a large CFD job.
Patrick ChiuCorrect.
Ian FinderIt's much more bumpy, it's much less deterministic.
Patrick ChiuYes.
Ian FinderOne of the things that we've been able to do with this system architecture, that again x86 chips struggle with, is in an x86 system you might have eight cores on a little tile, you might have eight cores on another tile, and you have to hop between those tiles. So if you know exactly what the workload's pattern is, what the usage pattern's going to be, something like CFD, this is fine because you can line everything up, you can pipeline everything, and everything happens at the right time. What we're really proud of with Grace is it's 72 cores on a single monolithic die. Each core is comparatively very, very predictable distance from one another. When you build it into an appliance like WEKA's done, that appliance will sustain bumpy or unpredictable jobs in a much better way. And that's really important, because in HPC you never want to be waiting. The worst thing you can do is be waiting. And if your workload has an unpredictable usage pattern, the way the chip responds to that type of usage pattern is just as important as how it might respond to throughput under predictable load patterns.
Nilesh PatelIn fact, this is where the matching of our capabilities become extremely important, because one of the differentiation of WEKA and WEKA data platform is this ability to serve any workloads equally well with the lowest latency possible and the highest throughput. Whether you are throwing one terabyte file at the data layer or 4K lots and lots of small-
Ian Finder4K, chunky, disparate.
Nilesh PatelExactly. And I think we are able to spread the workload evenly and be able to really deliver a consistent performance. And that's what we found in the AI pipeline. There's not a single workload type. It's not like a classic HPC.
Savannah PetersonThat's a really good point. Yeah, keep going. But I'm really glad you brought that up because I think people know that AI workloads are generally big, but it's so much more complex than that, right?
Nilesh PatelIt's big in the amount of data that is in process. But if you go from ingest, where you're doing a lot of sequential writes, and then when you're training you're doing large amount of sequential reads, where GPUs get busy. But then we are doing pre-processing and inferencing, there are a lot of-
Ian FinderVery bouncy....
Nilesh Patelmixed bouncy reads and writes. And that's where the zero-kernel, no-kernel bypass architecture we have designed, plus the metadata scaling architecture, we can support billions of parameters. And which is where traditional storage stack completely choke. And combine that with the improvements that Grace has built, I think we are expecting the whole next level of amplification there.
Ian FinderThe other thing is, even within things that are traditionally thought of as somewhat predictable versus something like pre-processing, let's say training, you're now at this point where infrastructure is so important it's all converged. So you might actually have on the same set of infrastructure the same set of storage appliances, you might actually be serving four different jobs with different geometries that are happening concurrently. And even though each of those jobs may be somewhat predictable in its storage access patterns or its network usage patterns, when you combine them all together you have almost a multi-point propagation kind of interference that happens. And when that builds at a data center scale, that can be very disruptive. So again, it all comes down to balanced, deterministic performance architectures. And that's why this has been such a strong partnership and why Grace has been able to shine so well with this type of deployment.
Savannah PetersonWell, and this is why you're the industry-first solution here.
Nilesh PatelCorrect. Yes.
Savannah PetersonThis is all making a lot of sense.
Ian FinderBut it's also because they're fast. And so is Supermicro.
Savannah PetersonThat's what I was just going to bring up next. I'm glad you teed me up there. So I'm curious, Patrick, you get the phone call from Nilesh and he says, "Hey, I want do this. Let's do this super fast." This is back in July, just to be clear. So the rate of innovation here is <inaudible>-
Patrick ChiuYeah, we got the email in the 11:00 PM, said, "Patrick, I have this idea. Do you have a time to chat tomorrow?" I say, "Okay," then we chat. I want to do the Grace storage. Are you sure? And then we discussed a couple email exchange, and also with the Ian. We found out, just we discussed, it's a perfect match with the best, the powerful baddest CPU, and the Supermicro built the best box. And also we have the latest component, SSD and all the kind of the thermal and the design plus the world-class software. We are missing one component, we cannot achieve today's results, right? So we are so exciting. And then we just have a meeting and the executive meeting, and then we decide to go. And pretty much four months later, so we have a prototype and we are going to release pretty soon.
Ian FinderTo anyone who says ARM is difficult, right? We've seen Google with Axion, we've seen Amazon with Graviton. You can't buy an Axion, you can't buy a Graviton. Microsoft and Cobalt, you can't buy a Cobalt. You can buy a Grace, you can put it into a storage platform, you can put it wherever you need it. And the fact that through this partnership we have hardware and software coming together that quickly, who says ARM is hard?
Savannah PetersonWell, you're proving that it's not, sitting here.
Ian FinderThey're also really good though.
Savannah PetersonYeah.
Nilesh PatelWell, no, I think that's a fantastic point. And this is where, again, the way the software is written for WEKA, we are a true software-defined storage. For us to bring our software working on x86 architecture, to first showing up on ARM-based client side of the functionality, and now full-fledged operating system storage use cases serving out of the same software, it shows how flexible the design is developed. And as we continue to start pushing together with NVIDIA on optimizing specific aspects of the design, I'm expecting to have a lot more value proposition come together. I think Ian talked about converged, is another use case for the storage platform that's going to be extremely exciting. In this case, the software part of it is going to amplify and leveraging the compute infrastructure by creating a sheen of storage software and thus not requiring a lot of hardware.
So this further reduces the hardware footprint challenge, but still leverages the innovation that NVIDIA keeps pushing. And between the CPU, memory, networking, all the innovation that are really driving the low-latency, high performance, non-blocking use cases, really serving well. And I think that really allows us to unleash the power of WEKA, which is what we are able to deliver consistently well. And then partnering with Supermicro to bring it out to the customers in a rapid space. There's so many different interesting innovations that Patrick and the team were able to think about so that we able to pack more I/O cards, more drive, so petabyte per U.
Savannah PetersonIt's impressive.
Nilesh PatelSo that's pretty impressive.
Savannah PetersonI'm impressed. And I'm not just pulling your leg. It's very impressive. This has got to be really fun for you to see this type of innovation happen so fast.
Ian FinderI mean, again, I love it. And the reason I love it is for the first time we have a single SKU of CPU that's in Grace Hopper based supercomputers, Grace Blackwell based supercomputers, Grace only based supercomputers, and it's in storage appliances, high-performance storage appliances. This is not your backroom NAS. This is a fire-breathing dragon that allows us to sustain this infinite need for throughput. And-
Savannah PetersonWhat a visual that was. Thank you, Ian.
Ian FinderThat's what everyone here is all about. At the scales that we operate at now in this industry, you can't afford for anything in terms of performance, in terms of latency to be non-deterministic. And we're solving that by putting the same chips at every level of the stack. That's really cool. That's really cool to me.
Savannah PetersonIt is really cool. This whole partnership is really cool. Okay, gentlemen, we are tearing through the segment, which is awesome. I have one final question for you.
Nilesh PatelOkay.
Savannah PetersonToday, obviously a huge day. Again, congratulations.
Nilesh PatelThank you.
Savannah PetersonAnd honestly, just impressive. Speed of innovation is one of those things that warms my little nerd heart. And actually seeing solutions like this delivered at a scale that's going to be dependable and scalable for all these folks is amazing. Can't wait to see what your customers do with all of this. Since this has been such a great dialogue, and I'm confident we'll have you back on stage at SuperComputing 2025, what do you hope to be able to say when we're in St. Louis next year that you can't yet say today?
Nilesh PatelWell, I think we will show tremendous proofs of various different customers truly delivering the value to what we are delivering here. Based on the pace of innovation that we are driving, what NVIDIA is driving, what Supermicro is driving, we will be talking about not two X, but three X or four X value proposition at a solution level by then. And talk about how well they have done with this product.
Ian FinderSpoken like a true product officer. Yeah, numbers.
Nilesh PatelYes.
Ian FinderThis is the beginning. This is the overture. Once the hardware hits a loading dock somewhere, that's when the numbers come out and that's when everything becomes real. And we have all the roof-line analysis to know that we're going to end up in a great spot and it'll be really great to show that next year.
Nilesh PatelWell, it would be awesome to say at theCUBE we'll announce the highest tokens and the lowest dollar with the lowest wattage ever in the industry.
Savannah PetersonYou heard it here first, and you'll hear it here first next year at SuperComputing. Patrick, anything you're hoping to be able to say?
Patrick ChiuYeah, <inaudible> I think this is a great start, but that is not the only one. Right? We will continue to work with the best partner, like NVIDIA and WEKA, continue to involving with the new technologies and the new GPU-related or HPC storage. But I think this is very exciting. And by end of year 2024, 2025, definitely we'll have more product and a integrated product. Because with NVIDIA's ecosystem, you can easily to connect to the other GPU or networking, the whole ecosystem. Next year we'll show you more.
Savannah PetersonCan't wait to hear all about it. Cannot wait to see how this story develops. Patrick, Nilesh, and Ian, thank you-
Ian FinderCall to action. People should check out the machine, right?
Savannah PetersonOh yeah. Check it out. Come take a look. Come play.
Nilesh PatelYeah, <inaudible>.
Savannah PetersonLove that. Absolutely. And if you're not able to come play, I'm sure you can see some videos and demo of all the cool stuff going on here at SuperComputing 2024 in Atlanta, Georgia. Gentlemen, thank you so much. I really appreciate it.
Nilesh PatelThank you.
Savannah PetersonThis was a great time.
Ian FinderThank you.
Savannah PetersonI thank all of you for watching wherever you might be on this beautiful rock. We're loving Georgia. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.