SiliconANGLE theCUBESiliconANGLE theCUBE
  • info
  • Transcript
James Hamilton on improving latency by moving some of the SDN functions to hardware
Clip Duration 02:27 / July 21, 2020
AWS re:Invent 2016: Tuesday Night Live with James Hamilton
Video Duration: 1:33:27
search

Ladies and gentlemen, please welcome vice president and distinguished engineer of Amazon web services. James Hamilton.. Wow. Welcome. Welcome. Welcome. It's good to get us started on. What's going to be a super action packed week. This is, this is going to be a really big week. I remember my first re-invent, there was 6,000 people here. I thought it was phenomenal. I learned more than you could possibly imagine it blew. It blew me away in the first reinvent, seeing what customers are doing with the product already. Phenomenal, just absolutely phenomenal. And so when I see what's here today and here there's 32,000 people here. Wow. Do you think this cloud thing's going to happen? It looks pretty big. Amazing. Thanks very much for taking time out of your super busy work lives in order to come here to learn about AWS. Thank you for taking time out of your home life, to be able to come here and spend a week with us and AWS, our job is to make sure we deliver enough content for you so that when you go back to work, you're able to get more done. Your companies are more successful and you're going to come back next year because we love being able to actually get a little closer and talk in a little bit more detail about some of the things that we're doing. And that in fact is what I get to do today. The plan is just to give you a little background is we're going to, we're going to dive in, spend some time on infrastructure, partly because that's where I work. And partly because I think it's super important to understand some of the unique advantages of AWS. So that's the first thing after that, we're going to have a few people come up and help me and give, give me present some other angles. And I'm going to finish off at the end of the show, talking about big stuff that we're doing with sustainability and renewable energy. Super exciting. Hope you enjoy it.

Let's jump in. I always start with this slide. I'm in love with this slide. We update it, but it doesn't matter if we, if we never updated it. If we left it the way it was three years ago, it'd be wonderful. This is a crazy number. Think what this means. Just imagine, forget about all the innovation. Forget about all the things I'm going to talk about today. Just imagine figuring out how to do this. What you need to do is you need to get all the component manufacturers delivering to server. Man, Gibbins delivering the server manufacturer. The server manufacturer has to build boards, assemble servers, package them up, test them, push them into the supply chain. The logistics channel has to deliver them to our data centers. Trucks have to back up to the bright data center, deliver racks. There has to be enough data center space built. There has to be networking. There has to be power. The servers got to just put installed by by technicians and it all has to be brought up. And then we do it again tomorrow. This is amazing. I mean, just, I'm just shocked at what the team's capable of doing. If, if, if I was to think when I joined AWS a few years ago, that anything even approximately that like this was possible, how would you do it? If it sounds like the moon mission? I mean, it's just seems impossible. So this is happening every day. If you work with me and say, let's say, is it likely, is it likely that Amazon uses around the same number of server resources as a fortune 500 company? My take is they probably use a lot more, but let's just be conservative and say, it's about the same. That means we're taking on a fortune 500 company every day. This is, this is, this is happening. This is happening in a very big way. Right now. I'm excited by this slide. This is the first time this has been showing outside of Amazon. It originated from actually it was an internal technical presentation done by Brad Porter. I fell in love with it. This, this slide is absolutely wonderful. What it shows on the X axis is of course time you can see events happening. Prime day 15, the fourth quarter, uh, prime day 16, see wild prime day 16, pretty big deal. Um, on the Y axis it's virtual servers. So we've got virtual servers on the Y axis. They won't let me show you the Y axis. It did have, it did exist internally, but I will give you a scale point. I'll give you a scale point. If you walk down there, look at prime day 16, that is tens of thousands of servers. Tens of thousands of virtual servers brought online just before they were needed. They delivered the goods and they were brought offline at as soon as they weren't needed anymore. That is a phenomenal piece of news. That is what that is. What's happens if you get good at managing the cloud most workloads. And we talked a lot about law plasticity years and years ago, and it turns out the reason why this conference sold out a long time ago. And there's 32,000 people here. It's not because of elasticity. It's because it's top line growth. It's because it allows companies to be agile, to try new ideas, to free up innovation, to, to test things out without waiting months and months and months to get resources for things that work to deploy them quickly. And when something works live double down, go big, go fast. And that's why people are using the cloud. That alone is probably enough. It also happens to be a heck of a lot cheaper. I'm going to go through a bill zillion reasons tonight, why it is a heck of a lot cheaper. The funny thing, this slide has nothing to do with any of those things. This is gravy. This is what you get. If you're in the cloud, you've taken your workload. You've, you've put it in the cloud. It's it's, it's costing the company less. The company is actually producing better results. It's, it's better. It's a better place to work cause there's more things happening. And yet you can still go as you get better at managing the clouds and start to exploit the capabilities. You start to get this type of gain. When you start talking about tens of thousands of servers, let's get a scale point on that. Tens of thousands of servers is a mid sized data center. That's pretty big. It's a mid sized data center that came to life lives for a few weeks and was brought back down and midsize data center, 150 million, 250 million all park in that area. Think about what this means. Conventional companies running on premise. What they have to do is they've got to provision to that peak and because it takes a long time to get service through the acquisition cycle and actually get them in and actually get them tested. You would never dare run that red line. I've shown there. You'd want to bring resources on fat much earlier, and you'd want to have more resources available because you're very slow to respond to demand signals. And if you're slow to respond to demand, SIG those, what do you do? You have to over provision. It's as simple as that. And so you over provision so many different ways that you've got a $200 million asset you've committed to it for three years and it does absolutely nothing. This game is not why this room is full. This is the game that's. This is the game that is, that is available. If you start to exploit the unique capabilities of the cloud, that's why I'm excited about this slide. Here's the Amazon regional network, 14, 14 regions all across the globe for more announced for now, for next year. Great news. I love what we've got. We're going to have 18. We're going to have 18 regions at that point. And these regions are real regions. These are AWS reasons, regions. These are not, Hey, I stuck to two racks in opposite ends of the same data center and they're relatively independent and there's a wall between them. So fires are unlikely to spread and there probably won't be a flood. So those are availability zones. No, our availability zones are real. They're, they're separate buildings and they actually do survive through all of those different faults. And I'll show you in detail because I want you to see exactly what the difference is and the best way to know the difference. See the difference simple as that 68 points of presence, presence spread out throughout the entire globe. This is the one that's great. You haven't seen this before. I got a question two years ago saying, does Amazon have a private network? Do you deploy a private networks? Is there any you spend on that? A lot of companies talk a lot about it. Do you consider that? Yeah, we thought about it. That is all a hundred percent Amazon controlled resources. That's AWS. If you're flowing between one region and another it's flowing on that network it's network is managed by one company it's not passed from one provider to another transit provider to another interconnection site, to another interconnection site. These interconnections sites are wonderful, wonderful, very committed individuals. But my rule is if you've got a packet, the more people that touch it, the less likely it is to get delivered. It's as simple as that, it's just one administrative domain is way better than many administrative domains. And sometimes in the internet, weird things happen. Like one company is not getting along exactly with another company and they're trying to work on a contract and maybe the resources get a little squeezed during that time to kind of rush the contract along. We're not going to do that. So if it's running on our network, it's under operational control. We give you better quality of service. And we always have assets. We always have assets to be able to survive a fault. There's no way where a single link will ever have any impact on anyone in this room because we have the capacity to survival link failure. And we engineer it that way. Simple it's it's, it's, we'd be crazy not to this by the way is not just a little tiny 10 gig network. This is a hundred gig network. Every one of those links that I show you a hundred gig, every one of them, a hundred gig, absolutely everywhere. And of course, a hundred gigs, not enough for, for many places. And so it's many, many parallel, a hundred gig links all over the place. So this is a relatively, this is a pretty important asset. When we started this, I got to admit I was a little concerned because it's really, really, really expensive. And so I'm concerned the networking team is hundred percent committed that this is the right thing to do from a quality of service perspective. Absolutely the right thing to do. And you know, something, if you, the team is really good at finding great value. And so these private resources that we have available, they're short term leases, longterm leases, they're dark fiber that are lit under eye argues. There were one several cases. Now we're laying our own cable. And so everything's available. We'll do anything that whatever's most cost effective to get the resources that we need to be able to serve. This is what we do. And because we're not religious about there's one true way. And we have to, if we get good value, let me show you an example. This is our latest project. This is the walky transpacific cable. The reason I want to show it to you, there was the groundbreaking in New Zealand was last week. It's kind of a big deal. This is this. This is a transpacific cable runs 14,000 kilometers at its deepest point. It's 6,000 meters below the seat. 6,000 meters below the sea. That's about three miles, three miles below the ocean. Interesting challenge with I can't resist telling you this because I was so captivated by it. Myself, you start to get, every time you get involved with technology, you'll learn. It's always harder than it looks.

You know, how hard could it be to string of fiber between Australia and in the U S doesn't seem that bad. Shouldn't be a problem. It turns out signal to noise. Ratio is being what they are. You have to have repeaters every 60 to 80 kilometers. That's unfortunate. Okay. Got it. I understand. They have to be able to work for 20 years without service. Oh, that's that's okay. I understand. I understand. Oh, N and they have to work three miles under the sea. You go, Oh, okay. I'm with you and with him. And these repeaters are electrically powered. Ooh, that's not good. See, now you've got, now you've got electrical power, three miles underneath the sea. It's supposed to last 20 years and you've got to get power to it. I mean, those really, really, really long extension cords you see used in some, in some lawns, it doesn't feel like the right thing. So you've got to find a way to get power to these things and what they do. If you look closely, you'll see the copper sheeting that wraps the fiber. The fiber is actually wrapped in copper. So if you look at it closely, you'll see there's, this is a bundle of fibers, some insulation, and then a couple layers of copper. Now the problem is these are a lot of repeaters. And so you have to have a lot of copper because it's carrying a lot of current. It takes a lot of power to run them all, and you can't do that. It's just, it's not cost effective to do that. So what's the trick, same trick that gets played on on long haul transmission and terrestrial power lines. And that is, if you need a lot of power, you can either deliver a lot of amperage, which means you need a lot of conductor or a lot of voltage. And of course, they go to a lot of voltage. So the reason why those, those pipes, those conductors are relatively small is because they're running very high voltage. This, these, these devices are running on direct current, and it's actually 10,000 volts, positive DC, and 10,000 volts, negative DC. One more little tidbit, just because I think it's a really interesting one. If you look closely, you'll say their are two conductors. What if one fails? What if someone anchors at the wrong spot? What if a fishermen crawling gets a little, uh, aggressive and hopes to get something bigger? What do you do then? If one of these, one of these conductors is, is open to the sea. If its shunts to the sea, it's you, it's your down absolutely down. There's not a third conductor. So how do you have redundancy in this cable? It's a super interest by, I think it is. It's an interesting trick. So you've got 10,000 volts, positive, 10,000 volts, negative this 10,000 volt negative shots to see. So it goes up to zero. It floats up to zero. So what you do if you're managing the cable is you lift this one to 20,000 volts. Now you still have 20,000 volts hitting to every repeater. It's the same. It's the same voltage levels before. Same difference of potential. You're using the seawater as the, as the, as the third wire, if you will really cool trick service, the gold service, the cable fault, and then stripped it back down to say the state the same way again, it's one of the few times where to actually get, when you need to, you've got to, you've got redundancy kind of surprising. Okay? What we've got here is back to that beautiful network. I have to show it one more time. Now we're going to do is we're going to choose one of those regions. I'm going to choose a fully developed region because I want to show you how big it can get and all, and the full richness that's possible inside a region. So dive into a region, let's see what we've got. First thing we're going to look at. It's naturally Ws region. This isn't fictional. This isn't what I hope it will be someday. This isn't, you know, uh, a artist rendition. This is what's. There is actually what's there. So every one of our regions there's 14 of them worldwide is going to be 18 worldwide. Every one of our regions has at least two AZs. When I say an a AZ, I mean a building we'll come back to that. A separate building. Most of our, most of our regions have three daisies. All of the new ones we're building, we're going three AC right now. It's just, it feels like the right place for us to be. This particular one is five AZs. So relatively, relatively big from a scale perspective, every region has two transit centers. The, the job of the transit center is to provide connectivity to the region, to the rest of the world. Our PR our private network are the Amazon global network connects up into transit centers. Customers that are direct connecting to us are, are hooking up through pops or possibly up and through transit centers. All the everyone we're appearing with through the transit centers, all of our transit providers through the transit centers. And so to transit centers, there's another constant will always have. Now we've got, we need to wire this up. We've got five. AZs the first thing we're going to do is we're going to wire inside each AZ, we're going to run fiber to hook up each hazy. Then we're going to run fiber to hook the AZs up with each other. Then we're going to run fiber to hook the transit centers up into all the AZs. You see what's going on. There is a lot of redundant fiber here, and the word redundant is a wonderful thing in the networking world, because it means when things go wrong, when someone decides to dig a hole in the wrong spot, things keep running. You don't, you don't feel that redundancy when you're running, but when, but when you, when you don't read about it, it's because of that redundancy we've got in this particular design and this particular AZ we have 126 unique spans, which is a pretty substantial number and get this. Of course, all of those men, many of those strands are more than a fiber. In fact, there's 242,472 strands throughout that 126 links. There is a lot of fiber. Here's another interesting little tidbit at least to caught my interest is we use two inch conduit. So the industry pretty much runs on two inch conduits. Well, we're running a lot of fiber between these buildings. Do you want to dig another hole and run another two inch conduit? Like, like not especially and fibers are small. So how you'd think it wouldn't be that hard, but you need to have strength. Otherwise the fibers will break when they're polled. So you need to have a core that's strong enough structurally that it can, that can last second thing is that the whole bundle has to be armored sufficiently well that when the construction workers pull it, it survives and it's able to stand weather and environments that underground. And so company that we're working with is really doing phenomenal work is they start in a pretty conventional place. It's very convenient, very common to have ribbon cables, you know, fibers that are actually in ribbons. And what they're doing is they're taking the ribbons, folding them into a V and stacking them. And so it forms kind of a V that fills up a quadrant, and then they're doing it again and doing it again and doing it again. And by the time you're done that somehow they get 3,456 fibers. We're the first company to deploy this technology. We absolutely love it. It saved us a ton of money because we're running so much fiber. And you might ask it from the networking world. Instead of running a lot of fiber, you've always got a choice. There's other things you can do. One of them is, is what I showed you back there on the hope on the walkie cable that is running. Every fiber pair is running a hundred waves of a hundred gig. You can run parallel waves on the same fiber. And so instead of running a hundred fibers, you can run one fiber and a hundred waves on it. And so what you saw back there was a 30 terabit fiber with, with, with six fibers, 30 terabyte cable was six fibers. So why don't we use the same thing here? And the reason is current technology has DWDM or CWDM that's course wave division, multiplexing, or dense wave, multi wave division multiplexing cost more. Just the bottom line is it's it's, it's more cost effective for short distances to run independent fibers for long distances. It always wins to run, to run multiple waves on the same fiber Silicon photonics will probably change that our plant will probably eventually end up with running multi wave. I'm very confident that'll happen, but it's not happening anytime soon. It will. It will be a little bit, a little bit. It'll be a little bit of time yet. So almost, almost all of those are single fiber, not everyone, but almost all of them. Okay. Let's jump into a fullest. Cale Daisy again. Remember this is a specific region. These are, these are actual numbers from that region. Every AC is one plus data center. No AC no. Pardon me? No data center has two easies, no games. Like I told you, third one is already on the network links. We cut that cupboard final thing is this one is I found this one. I should know. I should know these numbers, but in fact, it blew me away. We have several, not one, several laziness. These are Easy's are a part of a region, 14 regions worldwide and single easy. We have several at 300,000 servers. Wow, big numbers. Here's the data center. This is one more. We kind of go backwards where a lot of the numbers I've show you. I find to be big numbers and surprising numbers and considerably bigger than they were. Last time I showed them these numbers. Haven't really gone up that much. Last time I showed you, I think it was 25 to 30 megawatts right now I'm saying 25 to 30 to almost all of our new builds are 32 megawatts. Why aren't we building bigger facilities? We could easily build 250 megawatt facilities. I've been in 60 megawatt facilities, nothing challenging about it. You remember whatever you want. Here's what's going on. It's it's it's, you know, the reason is the same reason we do everything. It's data. We just use the data. And so when you're, when you're scaling up a data center at when you're very small and you add scale, you get really big gains and cost advantage. And as, and as you get the bigger and bigger and bigger, it's a logarithm, it just gets flatter and flatter and flatter. And it starts to get to the point where the gains of going to a lot bigger are relatively small. The negative gain of a big data center is linear. And that is if you have a data center, that's 32 Meg and it's 80,000 servers. It's, it's bad if it goes down, but we're actually have a sufficient scale that you don't notice it if we can work through that. But if you double that 160,000 triple that quadruple, that starts to get upwards of half a million. If that goes down the amount of network traffic, the to heal all of the problems, it's, it's, it's a V it's not a good place to be. So our take right now is this is about the right size facility. It costs us a little tiny bit more to head down this path, but we think it's the right thing for customers. That's the region. Let me show you a little bit on networking. Always have to have my rant on old school networking. And because it held back the industry for so long vertically integrated networking equipment, where the, where the ACX, the hardware, these protocol stack is supplied by single companies. It's, it's really, it's the way the mainframe used to dominate servers. And it's, it's actually an interesting observation. If you look at where the networking world is, it's, it's, it's sort of where the server world was 20 or 30 years ago. It started off with you. You buy any mainframe and that's it. And it comes from all one company. The networking world is the same place. And we know what happened in the server world. As soon as you, as soon as you chop up these vertical stacks, you've got companies focused on every layer and they're innovating together and they're, they're all competing, great things happening. And so the same thing starting to happen in the network world, it is a wonderful place to be. And it's what's what's happening is it's it's, it's going to cause it is causing it's already causing the ratio of networking to server is going up. In other words, sir, for a given server size, the amount of networking required to support it is going up. And partly it should have gone up before, but networking was artificially expensive. And so server resources were beginning stranded. Um, now when they're moving to commodity, um, this and longer happening, we run our own custom built networks, um, rotors. So these routers, that particular one happens to be a top of rack form factor. These routers are built to our specs, and this is the wonderful thing. We have our own protocol development team. When I ran two boat, how poorly served we were by vertically integrated rotors. I mostly talk about costs and it was cost that caused us to, to, to, to go to actually head down our own path. But it turns out as big as the cost gain is. And Oh, by God it is a big cost gain. Um, the biggest gain is, is actually reliability. What happens is networking gear is very expensive. Every company has people like me that have big ideas and they say, Oh, I've got a requirement. I'd like you to add some incredibly complicated piece of code to your system. And so they say, sure, and after awhile, the networking gear is absolutely completely unmaintainable and the next release comes out. They don't test all that stuff that people like me ask for because nobody uses it in any way. And it does work. It just doesn't work. Our networking gear has one requirement from, from, from us. That's the only source of requirements and we show judgement and keep it simple. We actually, it's our phones that ring in this, our pagers, that ring, if it doesn't work. So it's well worth, keeping it simple, as fun as it would be that I had a lot of really, really tricky features. We just don't do it because we want reliability. So it's much more reliable system. And I honestly wouldn't have guessed that when we headed down this path, I was making excuses saying, initially it won't be as reliable. It was way more reliable from day. One way more from day one second thing is okay, you've got a problem. What do you do? Well, if a pager goes off, we can deal with it right now. It's our data centers. We can go and figure it out and we can fix it. We've got the code. We've got skilled individuals that work in that space. We can just fix it. If you're calling another company is going to be a long time. They have to duplicate something that happened at the scale I showed you in their test facilities. How many tests, facilities look like that? There's not one on the planet. So it's just, it's six months the most committed, best quality, most, most serious company. It takes six months. It's just, it's a terrible place to be. So we love where we are right now. We jumped on 25 gig, early 25 gig. It looks like a crazy decision. If you look at it and I was heavily involved with this decision. So I want I'll, I'll defend it. Um, you know, there's industry standards are 10 gig and 40 why the heck with your build 25. It's just like Haskin for trouble. And Oh, by the way, it's 25 was really new at the time. And there's a bit of an optic shortage shortage happening at the time. So it's risky as well, but here's what's going on. If you can't take that, if you're not willing to find a way to solve the optics problem, you have to run 40 gig. So that's where most of the world went. We're confident that 25 is the right answer and I'll show you why real fast. It's super simple. 10 gig is single wave. That's 10 gig. 40 gig is four ways. It's still the same thing. It's, it's basically, it's all, it's not quite this bad, but it's 40 gig is almost four times the optics cost of 10 gig. So it's just not a great place to be. 25 gig is one wave. It's almost the same as 10 gig. Again, not quite true. It's a little bit more money, but it's almost the same price as, as, as 10 gig. And so what that means is on this model, we can run 50 gig, which is more bandwidth and we get to, and we get to do it at much less cost. Cause we're only running. We're only running two waves. And from an optics perspective, it's absolutely the right answer. I am totally capable. We buy enough that it doesn't matter. I mean, the vendors are extremely happy to serve us. And so it's not a problem, but I believe this is where the industry is going to end up. And the reason is it's whenever you've got the right answer, when it looks that good it'll happen and it has happened, we are deploying these by the unbelievable numbers. Um, which is good. I mean, I'm glad we are here is the, um, here's the ASIC that runs in our routors today. I am super excited about this because remember I referred earlier to saying, Hey, servers went down this path and D verticalized, you can not, you can now buy Asics individually, well or very large numbers. Uh, you can buy a six without buying the rest of the gear. Um, we're we work with Broadcom. This particular one is a Broadcom Tomahawk. It is at the time when it was released the most complex by transistor count ACE, um, application specific integrated circuit. There is on the planet. These are monsters. These are absolute monsters. But the beautiful thing about this, this is a 3.2 terabit part. What does that mean? It's 128 ports of 25 gig. All ports can be running all flat out with no blocking it'll flow, 3.2 terabits at the same time through this, no wonder it's a complicated part. Why do I like that? Well, nonblocking is a wonderful place to be, but the real reason I like it is there's a healthy ecosystem. So KVM Melanox, Broadcom, Marvell, barefoot, and opium are all building parts. There's six terabit and 13 terabit pipe parts coming, and there will be around the same price and just the same way the server was. And so now what happens is if you separate off op that basically networking gears has two costs. It has this cost and it has optics. And that's basically all there is all the rest is lost in the noise. What this means is this is on a Moore's law pace. That is a fantastic thing. Well, optics aren't, but with Silicon photonics, they're soon to be heading down that same path. And many of the optics where we're running today are in fact, multi chip versions of Silicon, Silicon photonics. So good things are happening. Software defined, networking, big topic today, super important for the last couple of years. We've had it since the beginning of VC too, because you need to have it since the beginning of EC to nor to offer a secure service. Um, as, as we do, um, starting around somewhere in 2011, I believe it was we made what was realistically a fairly obvious observation, but an important one. And that is whenever you've got workload, that's very repetitive and happening all the time as almost any network packet processing is you're really better off taking some of that down into hardware. And so what we did is we offloaded the servers and dropped it down and drop that network virtualization code down onto the NIC lots and lots and lots and lots of gains follow from that first gain is more resources are available. More cores are available in the servers. Good news. Second gain is things that are hard to read and hard to understand, but they're happening our little disturbances on the server, like flushing TLBs and things like that are now moved off. It's a little bit more secure, a considerably more secure. If a, if a hypervisor is compromised, you still don't have access to the network because it's a, it's a, it's a separate operating system, separate, real time operating system, running on the nic, running all of our software. That's all of our software running, running on that, Nick. And so that offload does wonderful things. Another observation is kind of a, it's a, it's an obvious observation, but a super important one. Um, and this applies to every level of computing. And so it's kind of a nice set of rules to keep in mind. And that is, if you, if you offload the hardware, you run rough numbers, you rough, you run a rough, roughly a 10th latency, roughly a 10th the power, roughly a 10th the cost. And so it's a big deal. If you can do it. Second observation is people say, Hey, the reason why we had to build custom networking gear is you could never have the bandwidth we have in our data centers. If we didn't build custom beer, well, that's not true. I mean, I can give you any bandwidth you want. It's just more parallel links and I can do it with anyone's equipment. It's not even hard to do. It's hard to pay for if you're using some of these commercial gear, but it's absolutely not hard to do. Do you know what is hard to do latency? That is physics. One, one is money. Physics is harder. Physics is you got a challenge with that. It's just the speed of light and fiber is the speed of light and fiber. And it's just, you know, there may come fibers that are a little bit faster, but it is basically the fastest you're going to go. So latency is key when you moved to hardware, the latency is, is fundamentally changed. In fact, the way I look at it is I tell software people, your numbers, the things you measure are called milliseconds. And if you put it in hardware, the things they measure are nanoseconds and microseconds. And so you're changing by big margins. And so this is the right place for us to go. Here's some good news. Here's some great news. Do you believe we're in the semiconductor business? Isn't that great? We're in the semiconductor business that Amazon web services is not only are we building hardware, which I thought was pretty cool. We built this. This is billions of transistors, billions, and billions of transistors. Every server we deploy has at least one of these in it. Some will have a lot more. This is a very big deal. Imagine what we can do if I'm right on those trends, I told you on hardware, implementation latency, the cost of power, et cetera. If I'm right on that point, and I'm fairly confident on that one, what that means is we get to implement it the silicone. So now we've got in the same company reporting up into our infrastructure team. We've got digital designers working on this. We've got hardware designers working on the NICs themselves, and we've got software developers. And when you own horizontal and vertical, we get to move at the pace we're used to. We get to make changes at the pace we're used to. We get to respond to customer requirements at the space. We're at the pace we're used to. We think this is a really big deal. And if I'm right, that says that there's going to be an acceleration of the amount of networking resources that are available to servers. Then this is a wonderful place for us to be because we're going to be able to step up to it at a relatively low cost because of some of the decisions I've laid down or outlined to you so far. Good news. There let's look at this one. What, one of the things that I do, if you, if you try to manage faults by looking at, at, um, by looking at basically you have a fault, you go post-mortem and you'll say, Oh, I shouldn't have done that. And he learned from it, and then you don't do it again. It's it's okay. I mean, you should do that. We were religious about it, but it's, I look at other people's faults and try to learn from them. And this one caught my view because, Oh boy, I know this fault. I know this fault button just by heart. And this is it's fun because I know it because we run a very large scale, but it's actually very rare event. I almost guarantee you, this company has never seen it before and they'll probably never see it again. Um, but it does happen and very rare events at very large scale happen, unfortunately is more frequently than you think. Let's look at the impact of this one. Chief financial officer, this airline reported, they lost a hundred million dollars of revenue went away because of this fault. The cancellations are listed there. 2% of their monthly revenue were gone as a consequence of this thing. And the report was switchgear failed and locked out. Backup generators. Let's talk. I know that one. What happened? I remember I happened to be in the data center for this one. It's I don't know why, but just fluke. I happened to be in one of our Virginia data centers. This exactly this events happened. Um, the ways I should tell you the way switch gear works, and they'll tell you what happened. The way switch gear works is the utility feeds in through the switch gear goes down into the uninterruptible power supply. And as long as the utilities there, of course, that circuit runs. If the utility fails, the switch gear waits a few seconds just to give you utility usually comes back very rapidly. Most faults are incredibly short and it's not worth starting a generator. If it doesn't happen, the generator starts up, spins up the full RPM. Um, wait, we wait. The switch gear waits until the voltage stabilizes and the quality of power is good. And then swings the load over to it. The poor old generators hit 1800 RPM and about seven or eight seconds. And they take load in about 15 to 20. It is not a good job. Do not apply. You become a generator in a backup data center. They get the load hard and fast. Okay? So that's the way they work. What goes wrong is if, if there's a fault out there that looks like it might have been a short to ground inside the data center, the switch gear is smart and doesn't bring the generator online into the load because it could destroy the generator could damage the generator and they view it as they view it as a safety issue, which is rubbish. So what happened? I am in the facility, six hours later, the switch gear manufacturer came to the facility to explain the problem to us and the, our data center manager, absolutely apoplectic, just completely apoplectic that he's cutting a generator running and we didn't hold the load unacceptable. And it's interesting that switch gear manufacturers will absolutely unapologetic. That's the way it has to be fine. There's there's other switch gear manufacturers. So we'll buy from someone else they're all the same. What are the odds? They're all the same. And so what we've done is the picture I'm showing there. Oh, actually the picture I'm showing here is that's that's normal commercial switch gear, and we still use that, but we changed the firmware. So the firmware that controls this switch gear does not do what I told you. And as a consequence of not doing what I told you, what happens is if there's a fault and it might be a short lead generator, we bring the story in the data center, we bring the generator online. We're going to do that. And the reason we do that is because that's what you want us to do. What are the impacts? Maybe it's unsafe, maybe what are the risks? Let's look at it. Well, the vast majority of the time it's outside of the data center anywhere, that's just the vast majority of the time. The one I had experienced with when I was at, um, a, it's actually kind of funny, this the same hole got hit twice. It's a longer story, but anyway, there must be a bar nearby there. And, and someone drove into a utility pole and aluminum utility pole, which fell across to the two phases of the power lines. And the spike hit the data center was, was extraordinary and switch gears at very unsafe. Don't go. Um, so let's look at what can happen. Well, in that case, it switches to the generators. Nothing goes, no problem at all. That's perfect. Let's say there was a short somewhere in the facility in the branch circuit, kicks out, everything else runs fine back the secondary power and those servers takeover. And again, you're fine. Okay. Let's say it's a fault, very high up in the system. And the generator is actually going to come into a direction short Mike, it might destroy the generator. Like not, I don't know. We've never seen it. Like it's honestly, to my knowledge, I've never read about it. It's never happened, but it could happen. And so maybe it will destroy the generator. And from my perspective, that's three quarters of a million dollars. We're very frugal. We do not want three quarters of a million dollars damage, but on the other hand, we certainly don't want to drop the load. And so we'll take that risk. If that happens. We've got a backup generator to backup that once all of our facilities are those magic words, redundant and concurrently maintainable, which is to say, you can have a system offline and at the same time have a fault and everything still keeps running. So it'll just keep running through that. No big deal. So that's what we've done on that. We're proud of it. We think it's, it's one of those details that nobody would buy AWS because we do things like that. But we had this fault twice. The super bowl, the super bowl in 20, in 2013 was down for 34 minutes. Exactly the same fault it does happen. And it doesn't happen here. It hasn't happened for years because of this.

Let me show you a custom storage server last year. Now two years ago, I showed you a server that I was involved with the design of it. I was kind of proud of 880 discs in a single, in a single rack unit, phenomenal density. Well, of course the team has done far better since then. And so what they've done is they've delivered this monster that somehow in a 42 U rack, not cheating, it's not tall rack. It's 42. You rock 1,110 discs this, by the way is an older design. I never can show you the absolute latest, but this particular design built back when it was originally designed was, is 8.8 petabytes, just in one rack. If it was built today with, with convect with current disk, that would be 11 petabytes, 11 petabytes, in a rack. A rack is not that big. It's it's really, it's phenomenal. What's really amazing is it's 2,778 pounds of disc, you know, power to weight ratio starting to matter. It's just really amazing. Let me go. The other way that I think is a piece of impressive piece of equipment. Let me show you one that may bore you slightly. I love it. This is a compute server. You look at it. And the first thing you look at is, dude, what happens? I mean, you could have put a second server in there. It's empty. Let's see guys. I thought you guys tried. It turns out this is optimized for thermal and power efficiency. And every other design we looked at at that time actually consumes more power. It's less efficient. And it's simply, it's simply, this is the winning design with what the OEM, what the OEMs are selling to customers are probably three or four or five X, more dense than this. And they're less efficient servers and they make up for it by charging a lot more. So I like this design. I like this design a lot things. You can see the space things you can't see are, but you can barely see. You can actually make out the name. This is very good resolution. You can make up the name of the power supply company on there. And so you might be able to figure out that's a greater than 90% power supply. The voltage regulators. You can't see those at all. Also greater than 90%. Think about this means our, our, um, our PUE, which is, which is power utilization, effectiveness. This is a measure on the numerator is the voltage that's delivered to the servers. And the denominator is the voltage delivered to the facility. And so the difference between those two is all the overhead. How much did you lose in cooling? How much did you lose some power distribution? Did you have lights on that kind of thing? And so a good facility and we do run good facilities runs on the order of about 1.1, two to 1.15 in that order. And so you figure 12% 15 to 12 to 15, 8014% R is roughly what the overhead is, which is to say 80 something percent high, eighties percent of the power gets delivered to the servers. And so we're buying hundreds of millions of dollars with a power. If one, if this power supply is 1% better, 1% times hundreds of millions, it's getting to be a pretty interesting number. Like you could have fun with that. So it's worth doing, and that's why this server looks the way that it does. And it's very different design from what's out there in industry. And the funny thing about it is that server, the, one of the reasons I showed it is it was recently a server that looks surprisingly similar was recently blogged by one of the major cloud providers. And I thought, eh, it's cool that one of our old service still nice to see it. So I wanted to show it to you. Okay. Mainframes, look at that number that most of us won't, I actually do believe it or not. I started my career at IBM and never so long time ago. And I worked on 30, 83, 30, 81. And the last system was an IBM 30, 90 model, 600 J this is a beast. It's the size of a midsize truck. It was the last of the emitter couple logic era. These are the least efficient power systems on the planet. This is the last system before IBM went to Samos. Um, so these are water cooled. They're directly water cooling up to thermal conduction modules. The thermal conduction modules are pushing with Springs to touch down very solidly on, on the CPU in order to conduct heat off of it.

These, these are beasts, they're technological wonders, but it was 30 years ago. It's dead Jim. If customers are still running on these, you'd be not wanting to admit it, but there will be some in some companies that are still running on these, and we want to help customers. We want to help customers get to a better place. So let's start off. Let's start off with, well, let's, let's start off with a toast to the death of the mainframe. Can I have a beer? This is carefully rigged. I get, I get a chance to have a drink instead of a Cheers toast to the mainframe death of the mainframe death of the mainframe. And don't worry if you're still running, you don't tell anyone we'll handle it. We'll take care of it. I'd like to invite, to come up on stage to tell you what you need. I'd like to invite up senior vice president and CTO of Infosys Nevine boat, boater, Raja, Come on up. Oh, this is a big room. I can definitely accommodate most of my company in her. Thanks for showing up. I think your mom did the place is kind of really freezing backstage. So all you guys, so, um, men trends, some have to announce, um, as James talked about the emphasis mainframe migration practice on AWS, why is this, um, really exciting? Uh, as James said, the large companies, actually, there's still a lot of companies who are, who are still running. These systems have been running them for 30 plus years. They've done a great job. Still do a great job, but why we're going to move off this, right? Because there are these challenges, significant challenges with these machines, right? Escalating costs. So Jessica guide and give you an idea, the way you measure these mainframes is this thing called maps, right? So they are about consecutively, about 30 million MIPS, Phil running out there. And what that translates to typically is about a hundred plus billion dollars of cost every year. I think that's a conservative number, right? And these costs are going up more data is coming in these mainframes that are typically designed to do batch kind of operations. People now want online access to that. And so on lack of agility, right? These companies, now the business wants to move at a much more rapid pace than what the systems were designed to do, right? So it kind of prevents you from doing that. And last but not the least skill shortage, these mainframe programmers are not getting any younger. We're not producing more of them. It's like my daughter, when I asked her, what, what does she think global is? She thought it was some exotic stone because it sounds like cobalt, right? So this is a problem. And this is really what we are trying to do as part of this migration practice that we are launching on AWS. So kind of think about it in three, uh, three ways what, what this can do for you. It's kind of a journey and different people might do this at different pace, but broadly you can actually really take the whole application, a COBOL application of JCL and actually move it onto AWS on a, going to emulation kind of software. You need to still make changes because you still leave the mainframe behind. So if you're running on a mainframe database like that, obviously doesn't nominate Ws. You have to change that to use a relational database like RDS, you might have to change how security security works. You might have to change how backup works, right? But this thing actually works. And you can actually have a saving up to 30, 35% by just doing this. The next, next kind of things you can work on is essentially take pieces of the mainframe pieces that you feel actually, you want to evolve much faster than others. That's coming in your way to actually evolve things at the pace that you want to evolve them and put them in a brand new fashion, light the cloud native code onto AWS, and then make them talk to each other through the API gateways on AWS. Right. That kind of, that is a somewhat agility challenge. And finally, you kind of go all the way down. You're going to rewrite the whole system right. In cloud native format, right? Yeah. That takes much longer, but that's where we really want to get to. So this sounds great, but what is the problem, right? Why, why haven't people been doing this? Because if this is really, really hard, right? These are very, very large systems, like have been built over 30 years. It's not unreasonable to think a system that is 10 million lines of code 13 million lines of code. And guess what? No documentation. Right. So how do you do this? And this is really where, uh, the practice that we are launching really kind of shines this kind of very brief overview of this. You don't have a lot of time, but the intent really is to actually use software, to amplify the people who are the experts, right? Like an iron man suit in some sense. And this is not kind of the traditional software that it's actually has a lot of AI pieces in it. We have a technology called emphasis mana that think of it as looks at the piece of code. And you can actually discover a lot of interesting things about it. Obviously the code structures, what the code does, business rules to some extent, but where, where are the various pieces of the code that are, have tied dependencies on each other? What does it make sense to actually break apart? What does it make sense to actually leave behind? Because it's no longer being used. The long story short, big, this piece of software, run it on the code that you have. And what comes out really is it's like an assessment, right? An assessment that actually a machine can actually make meaning out of. And you do that to actually then make very, very kind of objective assessment of how you should do the migration. Right. And that's fundamentally is what customers are looking for. They want predictability, they want to know that hot complex, this is going to be how long it's going to take. Right. And what is the right way to get there? Where do you start? Do you rehearsed? Do you migrate? You will do a combination and so on. Right? So that's really what the practice is. Again, a lot of details and courage you to go look it up. So what I had there actually was a set up customers who have begin, thank you, who have begun this journey for us. So very exciting and great you guys to actually come take a look, join us because this is really happening. And you want to be part of this. Thank you. And enjoy the rest of the show. Talk a little bit earlier about, you know, mainframes remembering my first job before my first job. One of the things I do remember, even perhaps even more clear is NASA and man space flight program. It just it's, it's just absolutely phenomenal. Um, it probably many of, of much of my curiosity and, and, and, and willingness to take on tough challenges was learned from NASA taking on we're going to put them out on the moon when they have no idea how to do it. And so I'm just blown away by the company. I think it's, I think it's just absolutely absolutely marvelous what they've done. We're very lucky today. I've got, I've got the chief technical officer and innovation officer of, of NASA jet propulsion laboratory. He's here to talk a little bit about work. That's going on there, Tom Soderstrom, come on up, please. So what a pleasure it is to be here and talk about space with you all in such a big space. Uh, so what we're gonna talk about is, uh, we look for NASA and JPL, look for the big answers. Uh, we're going to look for it in new ways. We now have new technologies coming and new workforce, new partnerships, and it's going to be an exciting time. Uh, these are the questions that affect humanity, and yes, this is a cloud computing conference. So I will tie in cloud computing into this. Uh, but first of all, uh, if you remember where you were four years ago, I certainly do. Uh, it was an exciting time where we're going to attempt a crazy engineering trick. We're going to land a 2000 pound Rover, 150 million miles away, uh, all by itself from thousands of kilometers per second, to stand still in seven terrifying minutes. Uh, what we did, we tried something new. We tried AWS to give you all the pictures at the same time as we saw them. And let's see what happened. And where were you when curiosity landed on Mars? I agree That trader face at this time, it'll be good pressurizing the propulsion system to increase the thrust of the system. We'll use that for all the maneuvering in the atmosphere we're about to do Actually I confirmed. So these answers really matter. And you have to think about the people who were in the room, their careers pinched on this. They've been working on it for years and years and years. And it was the generation that when we grew up, we looked into the heavens and we saw pictures. We didn't really know what they were, but they were interesting then came to space race, and we were transfixed. We were fixed in front of the TV. We were observing people on the surface of the moon. It was completely amazing. Now the next generation, they're not going to be happy with just observing. They're not going to be happy with just saying, can we get there? They're going to expect it to happen. They have much more knowledge. They have smartphones with NASA and JPL apps and, uh, they will be expect to be amazed by what they find, not if they find it, they don't want to just participate. They don't want to just observe. They want to participate. Uh, and they want to be part of it. Whether they're on the surface. As my personal heroes, when I grew up were astronauts, they will want to participate, uh, from using augmented reality or, uh, using their smartphones or on the surface as astronauts. So it's no longer about a space race between countries. It's a race for humanity into space. I want to take you on a journey of these big questions of discovering a universe. So first of all, how can we help protect mother earth? So we're going to go into detail. That's one of the big questions. Then we'll go deeper into the universe and we'll ask, is there life in space, then we're going to finish on my personal favorite Mars and see, is there a, was there ever life on Mars? So those are the big questions. And we're going to look for it in new ways. So let's talk about infrastructure. James talked about network. It's an awesome network, but it's ground-based. Our network is in space. And what we have is we have the deep space network that has antennas that are strategically place as the earth spins. We can always hear and communicate with a spacecraft. We have about 30 spacecraft, uh, in our solar system and beyond and instruments that we trap every day for NASA and for other, uh, industry. So what's going on right now? Well, if you were to whip out your smartphone, you could find out right now what's happening. So that's a change. Everybody can participate and how can we get better results? How can we improve this infrastructure? Well, certainly more data and more compute power is going to help us answer these big questions, how much data, uh, this is the, uh, surface moisture, active, passive, and, uh, ocean carbon observatory. There, satellites that are spacecraft the circle, the earth, and they help protect mother earth by looking for, uh, ice carbon dioxide water. And they collect a terabyte per day. That's a lot of data. Now I want to go into each of those questions a little bit deeper. So we're sending out two new missions. These are called SWAT and nicer. They are doing the same thing. They're going to circle the earth and they're going to look for they're to bounce radar off of the ground. And they're going to bounce it and look for the water table, the water table in oceans, in rivers, in the ice sheet, even down to reservoirs. So they will collect a lot of data. How much data they're a hundred times bigger than the other two, how much data is that? It's a hundred terabytes per day, a hundred gigabits per second, all the time, much too big for our data centers. So what are we going to do? We're going to use cloud computing. And it's not just about the infrastructure. When we had orbiting carbon observatory, we discovered something called spot market, and you're all going, duh, I know about the spot market for us. It was a revolution because we discovered we could all of a sudden compute at the fraction of the dollar pennies on the dollar. So that is not part of our operational way of working. So it is not just the infrastructure, but it's how we work. So that's going to help farmers. It's going to help predict us, predict floods, droughts, city planners. It'll help everybody. That's one of the big questions now, how else could we help protect mother earth? What if a big asteroid wasn't about to hit us? Uh, we found 1600, uh, earth, uh, near earth asteroids in just recently and what even close on, what can post, how would we change it? Well, how do we move it? Oh, we don't know. We're about to figure it out. So we're going to send out and NASA and JPL is sending out something called the asteroid redirect mission. And the idea is to very straightforward, go find an asteroid, but 400 meters, uh, go pick up a Boulder and multitenant Boulder from that, and then bring it in orbit around the moon where Astro astronauts can then come and mine it, and then learn how the tractor be made and redirected piece of cake, right? A few challenges. Uh, an asteroid has very low gravity. So you're likely to bounce off before you can even pick it up to the Boulder. The Boulder might be stuck. Uh, and how are you going to get there? We don't have enough propulsion to get there. So that's an interesting one. The solar electric propulsion is an idea that the NASA and JPL engineers came up with. And what they're going to do is use a Xenon gas to essentially use photons, to push the spacecraft forward. It's you're going to use one 10th, the amount of fuel it's going to have three times more power and two times more efficient. You can then pair them and you can. Then if you compare enough on each one will generate about 50 kilowatts of power pair enough of them to 300 kilowatts and you can get humans to Mars and back that's really the key. So what has club got to do with this? Well, it has everything to do with this. We decided to do it differently and actually use cloud to start the project. So it started in the cloud. Now we're partnering much more easily. We're pivoting just like a startup would. And by partnering with industry who will build a lot of this, it's going to be an amazing change. Another big question is where could we find life? And I'm sure all of, you know what this is, right? Maybe not. But for those who don't know, it's Europa, Europa is a moon of Jupiter. And it's why it's an interesting, but you need for life, you need energy and you need water. Europa has both. In fact, it has two times the amount of water that earth does. And what we can see is if you drill down on the surface a little bit, you notice something interesting. There are no craters because it's giant ice sheet is about 10 miles thick, but it flows like the Arctic ocean does. So we have water and we have energy. Now, 10 miles thick. That's a bit of a problem, but there are lakes inside of that ice sheet that we think is one of the more likely places to house life. So we're going to send an orbiter to orbit Europa by about 400 million miles away. And it's going to find the landing spot. Then it's going to land in all of these, uh, ice crevices. And can you imagine how much data and simulation that will take? Cause it has to be completely automated without cloud computing. There's no way we could do it. And we're using model based engineering and other interesting goodies. And it's all started in the cloud so that all of these things, by the way are happening in the next five to 10 years. Now, if we one day needed to export humanity, would we have a place to go? Could we find earth 2.0, so there's something called extra solar planets or exoplanets. And they are planets that circle their earth, just their sun, sorry, just like we circle our earth. And how do you find them? You look through a pinhole at the same spot in the sky for a very long time. And then you notice that there is changes in the energy that you get that is perhaps a planet orbiting its sun or a wobble in the, uh, the orbit of the planet itself from the star near it. When you do that, we've found so far, uh, 3000 confirmed exoplanets in just a few years, 2,400 unconfirmed, but of those are any of them earth, like could we actually live there so far? We found 21, uh, 21 by just looking through this pinhole in just a few years, problem is they're a little far away. The closest is four light years away. How far is four light years? We'll take a thousand years to get there with current propulsion. So what if you were to send a nano, you could actually get there in 20 years, a science fiction. So it was landing 2000 pounds in Mars, just a few years ago. Uh, innovation is moving much faster. The James Webb telescope is going to go from pinhole to much larger aperture. The science, the simulation, the math is going to require hundreds of thousands of servers to do this. And we don't have to own them anymore. So cloud computing is a really, really big deal for us that was near of an exoplanet. Now let's finish on Mars. Uh, we have not yet had humans on Mars. There's only been one to my knowledge and Matt Damon and he came back. So that was good. Uh, for us to send more people, we want to prepare for it. Now curiosity landed four years ago and it was an amazing thing. And like any teenager would do it took a selfie. So it took its giant robot arms and it spun around and it took a selfie to make sure we're healthy. What have we found so far? This is a picture from Mars. So welcome. What does it look like? Has anybody been to California? Death Valley death Valley looks just like this because in fact it was formed by flowing waters. Both of them were, uh, and earth and Mars had the same conditions for live about 4 billion years ago. Here's another picture. It's a, the non-meat dune on Mars. Again, it could be a Sandino death Valley. So we found flowing water, but we haven't found a smoking and we found conditions for life. We haven't found a smoking gun. We haven't found life. So JPL and NASA are going back. We're going to send a new spacecraft, Mars 2020 named for now to see if we can see the science of life and pave the way for humans. The last time we had 20 minutes. So it's amazing. You say it's 150 million miles away. You only have 20 minutes to figure out what the Rover is going to do. The next day 20 minutes means that sometimes you miss a day, the Rover stays parked. That's not a really good use of time, but if we could speed that up to five minutes, we wouldn't miss a day. How do you do that? Cloud computing. So immense compute power very fast and use some machine learning to augment it. So we've gone now and that's what we're going to do. So Mars, so cloud computing has gone from being a very nice to have and engaging you all with the pictures to something mission critical every single day. Now, what will this spacecraft look like or Rover rather? So if you look at it, it's going to have some changes. It will have new wheels, uh, stronger and skinnier and deeper groups. So it can climb a taller Hills, you'll have a microphone so we can hear the sounds of Mars. Uh, it'll be able to drill and store the rocks for future astronauts to come and pick it up or future missions, uh, even ma unmanned missions. And it's going to be the first test of actually producing oxygen on Mars to prepare for next Matt Damon that comes up there and need some help. It's also going to have some new landing techniques. The other thing we're looking at is, uh, creating a scout helicopter on Mars by not the may not be on this mission, but how could you fly a helicopter at a hundred thousand feet? We have an helicopters here. Um, no, but we've shown that in our lab, we could do it by using kind of rotating blades. So it's possible. And that could be a scout to peek over the Hill and see what's coming next. These look remarkably like toys, and that's not a bad thing. I'm contrary it, speeding up how we work, how we test, how we infuse. And if you saw BDA, this is BBA, it's a toy. Seeing our Tony Rover, uh, the little guy there is a three D printed Rover for $3 on the desktop three D printer. And we found that it could climb glass. And if it could climb glass in our lab, could it do it on the space station? So we put it in the vomit comment and it's a zero gravity and you can see it's climbing fine. Uh, and it worked. So this is an idea of going from a crazy little toy idea to actual something on the space station just a few months. It's unheard of speed. So this is meet Ravi. Ravi is a outreach Rover. It's a miniature copy of the Mars exploration Rover. And it's built from Arduinos raspberry pies, uh, open software and Amazon's IOT and Lambda tomorrow at the IOT, uh, state of the union. We're going to drive it on stage. I hope I hope it works. Uh, and you can use your joystick to drive it, but that's, so yesterday you're going to be able to drive it with Alexa so you can talk to it and you'll see how that works. And you can ask a questions about Mars and Atlanta answer and some other goodies that you'll find out tomorrow. If you don't have a robot, uh, you can just use Alexa. So today we're announcing that tomorrow you'll be able to use, you know, Amazon dots, your Amazon echos, and you can ask questions about Mars. This is all about exploring and getting crowdsourcing and getting people to understand and care about Mars and ask new questions. Uh, by the way, the rollover is a blueprint that everybody can have, and you can build your own. And it's for schools, universities, and museums, or your home. So with this, the idea is that we're all going to be the future explorers. The things that you do, the amazing compute power, the hackathon last night, just amazing from zero to a liftoff in just one day. And it all auto scaled, it was server less, and it was my most amazing things I've ever seen. That's what you're all going to help us do. And your children are the ones that will one day walk on Mars, uh, whether it is virtually through augmented reality or physically as astronauts. But, uh, please engage with us. There are many NASA and JPL people in the audience and find us during the conference or afterwards, and let's help answer these big questions for humanity together. And thank you very much for listening. I'm going to the IOT state of the union. That sounds pretty interesting. Tom mentioned machine learning, machine learning is one of those things where, you know, there's lots of hype about lots of things. This one is extremely real. This time machine learning is going to be absolutely the dominant workload. I believe it's going to cause the growth that I've shown you so far to look like just getting started. Small stuff, machine learning is going to be fundamentally important across the industry. It's going to open up new solutions. It's going to allow new data to be splitted in, in very different ways. But key thing, at least from my perspective, having seen a few of these major new transitions is gotta have scalable infrastructure. It's gotta be built on scalable infrastructure because it's always the case that the problem size gets bigger, faster than the systems get bigger. It's always the case. So scalability is number one, and I'm very happy to have Matt up here today. Matt is going to be coming in to talk about machine learning and what we're doing with machine learning. Give a little bit of rundown, Matt, please come on up. Good evening everybody. Thank you, James. I'm going to talk a little bit tonight about artificial intelligence and we've seen tremendous interest with our customers and tremendous momentum behind artificial intelligence, really over the course of the last five to six years. And it's been driven by the Trinity of three different areas. The first is a set of algorithms. These algorithms are actually pretty old. I did my PhD in machine learning 20 years ago, and a lot of the algorithms haven't really changed, but they're starting to evolve and become more sophisticated. And that's driving huge amount of momentum with artificial intelligence. The second area is the ability to be able to collect and harness increasing amounts of diverse data. And this is really being enabled the cloud and with AWS because the data center walls that used to box in the collection of data, I just melting away in the cloud where you can collect and store as much data as you need. And when you start to drive more and divergent collections of data through these algorithms, wonderful things start to happen. And the third part of the Trinity is the availability of utility computing, specifically the availability of GPU use at the end of an API call, which allow you to run the algorithms with large amounts of data with higher degrees of sophistication at extraordinary scale. So these three things are really fueling the next wave of artificial intelligence and one set of technologies that in particular has really come to the forefront. And that's something which they call deep learning. Deep learning is just a set of statistical machine learning techniques, which does feature detection through a series of hierarchial layers, trained with artificial neural networks. And it's really found a niche in solving some of the hardest problems in computer science, image analysis. Being able to look inside pitchers like this beautiful beagle here, and be able to tell, is it a cat or is it a dog? What type of dog is it doing? Voice analysis, natural language processing, personalizations and recommendations, and also driving the next wave of autonomous vehicles and autonomous robotics. Now the challenge is that these hierarchies of layers in the real world get really, really complicated. This is a, this is a real algorithm here. This is a, one of the image analysis algorithms, which is, uh, performs the best. And you can see just how many layers it has and how much complexity it has. And this is a real challenge across three different domains. The first is programmability. You need to be able to go in and define these networks and they can have thousands of different layers inside them, which is frankly, a bit of a pain. The second is you want these networks to be as portable as possible and to use a memory as efficiently as possible so that you can move them from different domains, training them up in the cloud, but also running them potentially on mobile devices, running them in the car or doing inference on other CPU use. So portability becomes extremely important. And the third piece of data science is performance. As James was saying performance in training and performance in inference of your machine learning models. And what we've seen is a tremendous amount of interest in running these sorts of training of deep learning, using GPS up on AWS. And a couple of weeks ago, we made available, uh, the next generation of, um, uh, P to GPU instances, which came pre-packaged and prebuilt with a army, which was preconfigured with a set of open source, deep learning frameworks, or these frameworks, things like Amex, net, TensorFlow cafe Theano torch, and CNTK put deep learning into the hands of mere mortals. They take care of a lot of the heavy lifting associated with fiddling around with Kuda calls and allow you to focus on the algorithm. They allow you to dictate and build those large deep networks to run at scale. And customers are found using these across our PTO instances, incredibly easy with. You can get access of up to 40,000 Kuda calls and deep learning algorithms such as the ones enabled through these frameworks are really custom built to fly across a large number of cuticles. So we package these up, we put them inside a, an army and from the marketplace today, you can with a single click spin up a complete environment to run any of these frameworks. And in addition to that, we made available a cloud formation template, which allows you to provision large elastic auto-scaling clusters of these deep learning armies. So you can take any of these frameworks, run your algorithms at tremendous scale. And customers have been doing that almost from day one availability of one really great example is the early detection of diabetic retinopathy at Stanford university. So diabetic retinopathy is one of the leading causes of blindness in the world. 12% of all new cases here in the U S of blindness are caused by diabetic retinopathy. And what happens is you get tiny aneurysms in the back of the eye, on the retina, and that starts to over time cause blindness and a paper I was reading the other day was talking about the, uh, the impact of potential early detection. The only way to detect a diabetic retinopathy is to take a photo of the fundus photo, like the one on the slide behind me and look for these very, very, very small aneurysms. And if you're able to do that early enough and treat the cause, which is the treatment of diabetes, um, then you can cure or prevent diabetic retinopathy and blindness in 90% of cases. So what Stanford are doing is they're using deep learning to go, and they've trained the system to go detect these tiny little aneurysms so that when these, uh, these photographs like this are taken, you can automatically detect problems and you can treat them early and you can prevent the, with one of the leading causes of blindness here in the U S we see other customers like Wolf from building completely new categories of applications and products such as Wolfram, alpha Wolfram alpha is a computational knowledge engine, and it takes a web of information and start overlaying different concepts on top of it. So you can ask more structured questions across really broad unstructured data. So if you need to know who made one of the greatest albums of all time, pet sounds, you can ask it and it will tell you, it was of course, the beach boys, and we see tremendous application across autonomous vehicles. There's a startup called two simple. These are some videos that they sent us and they're using deep learning with computer vision systems on cars to drive a really sophisticated autonomous driving. So here on the left hand side, you can see objects and lane detection both during the day and at night, you can see semantic segmentation. This is pixel by pixel recognition of the road surface. This is obviously taken in China, and if you've ever driven in China, uh, obviously that's a traffic jam. Um, and in addition to that, they have centimeter accurate positioning of the car in three D space using stereo vision on the cameras. And that's the third heat map you can see over there on the right hand side, which allows you to place the car incredibly accurately using deep learning techniques on flat images. So these deep learning frameworks, MX net TensorFlow, and the rest of them have really found a, a way of materially accelerating everything from autonomous driving to revolutions in healthcare, by making deep learning available to more developers. And I wanted to focus in on one of these libraries today, which is MX net. MX net is a lot of the characteristics that developers like when they're going off and building deep learning. So the first is programmability, MX net supports a really broad set of programming languages. So whether you're used to using Python or Scala where the, like me, you're a fan of Julia or Java script or mat lab, or go, you can use all of the languages that you're used to using and start running your deep learning straight away, because MX net has both a front end and a back end. And no matter what you use on the front end, it gets compiled down. So you get guaranteed sophistication and guaranteed performance are using the backend. There's some programming languages which have an imperative model. This is great. It allows you to write scripts effectively, and those scripts has anyone that's ever written Python or Ruby. Um, you know, that it's very, very flexible and you can get to use a lot of the loops and the language native features that make these environments, uh, so productive, uh, however imperative programs that are very hard to optimize. It's very hard to get the scale and the performance that you need because some of that flexibility is built in, and that takes away the opportunity or to optimize, or the other end of the spectrum. There are declarative programming languages SQL is a declarative programming language. You get more chances to optimize what's going on. You get to use that automation optimization across different languages, but it's much less flexible. And this is the approach that things like TensorFlow and Theano and cafe approach with Amex net. One of the core functional pieces of functionality, in addition to the multiple programming languages is the ability to be able to mix both imperative models and declarative languages. In fact, that's where the name comes from Amex net is a mix of networks. And this means that you can manipulate the graph of layers in the way that you might want to with the clarative, uh, in this case, symbolic executor's. And you're also able to mix that with some of the features of imperative programming languages, such as iteration loops and parameter updates or feature extraction. So this is an extremely powerful model. Being able to mix the two together. It's also incredibly portable MX net models fit in very, very, very small amounts of memory. It's incredibly memory efficient. In fact, a thousand layer network will fit in less than four gig of memory. And that means you can take your models across a much more diverse set of applications. You can embed them in mobile. You can put them inside connected devices, without many resources on board. You can use them to power robotics, such as drones. You can even run it with Java script directly inside your browser. So there's a huge amount of portability through the memory memory efficiency of MX net. And finally, coming back to the third peak performance, what performance enables is larger data sets to be able to mine and train, and it enables you to build more sophisticated models. However, writing parallel programs is extremely painful, particularly when you're dealing with tens of thousands of cuticles. Uh, this is even worse when you're dealing with complex networks, which frankly, if you're having a forward backward update evolves over to the number of layers, a scale, and that can be across hundreds or thousands of different tents or computations and calculations, it's very intensive. And the more scale, the larger your network, the worst, the scaling characteristics. So we're looking for, uh, frameworks, which help us achieve the scale. So we can build more sophisticated features and drive more data through it for training and Amex. And it has some really nice features. It allows you to take serial code and automatically parallelize them. And it'll pause out the data dependencies to do that efficiently. And it also put, apply auto parallelization of your data across those scooter calls using a coordinated key value store. And this allows you to be able to scale to multiple cores on a single GPU. It also allows you to be able to scale across GPS on a single box. Um, and so this allows you to take more data and build more sophisticated models. So we ran some benchmarks are running, uh, an Amex net network for image analysis across a set of GPS. And here you can see with the red line, this is ideal scalability. This is what you want to be. This means that if you add another GPU, you get a double speed up as you go from one to two, and we're seeing a 91% efficiency for image analysis frameworks. As you move from multiple one GPU to multiple GPS on a single box, but for large networks with sophisticated models. And when you're driving huge amounts of data through those models, you don't just want to scale to 16 GPS, which is what we have available directly inside the, uh, P two largest be two instances. You want to be able to create big clusters of those P two instances and run across as many of them as you can. So you wanna be able to scale across nodes and you want to be able to light up as many P two instances as you can. So we also ran the same benchmarks across a multiple set or actually 16 different, um, uh, P two instances. Uh, so you can see that this is the original graph that we just looked at. That red line again, is ideal scalability. And we're going to move from just a single instance with 16 GPS to 16 instances. So a 16 X increase in the amount of computing capability on the GPU. And you see here that you still maintaining close to ideal efficiency, 88% efficiency with MX net. So just a 3% drop in efficiency when you're scaling to 16 X, the capacity on the back end. Um, so if you want to take a look at this yourself, um, you shouldn't trust my own benchmarks. One of the benefits of the cloud is you can spin these things up, uh, and you benchmark your own code. Uh, so today we're making available a new, deep learning benchmark set of codes. You can use our cloud formation template to spin up the capacity that you need. Uh, you can log into the master node, and then you can run the exact same benchmarks that I just used to draw these graphs. And this is available today. So we really liked MX net. A lot of our customers really like MX net. And we recently announced that Amex net is going to become the, uh, deep learning framework of choice for AWS. That means that we're going to be providing code contributions back to the Amex net opensource framework. We're going to be investing in programmability and the developer experience. We're going to be building out more documentation, more example codes, and we're going to be investing in a series of tools around MX net. And in addition to that, Amex net is going to be the foundation of our future AI services. And to hear a little bit more about that, you'll have to come back tomorrow. So as much as we like MX net, we remain committed to building out a platform where all of these tools run as well as possible. This is the exact same strategy that we have for our databases, where you can run my SQL and Postgres or Maria DB or Oracle, or, or, or SQL server right there on RDS. We want to provide the same flexibility. So what we love MX net, we welcome all of your deep word learning workloads on this very, very high performance set of GPS. And with that, I'll hand it back to James. Thanks a lot. Thanks. That was dr. Matt wood, general manager of product strategy. Good job, Matt, did you see that multi server scaling and MX net? That's the golden one? Love that one. Let me talk to you about sustainability and renewable energy. We made a commitment, little bit something that's more well, more than a year back to be a hundred percent renewable in AWS. It took a long, long line of thought to get to that point, because for us to make that commitment, we have to at least see a path to get there. We're not afraid of big challenges. We will sign up for difficult tasks, but we have to see a path where we actually can, can reasonably get there. And we're there. I want to show you what we've done so far. Tell you announce a little bit about what we're going to do, and then show you how we've done, what has been done so far. So let's jump into that last year. We were at 25% and by the way, we know how to hit a hundred percent like that. So it be done. It's just, if customers put their workload in Oregon are a hundred percent renewable region, it's a hundred, we're a hundred percent. And for many customers, they do exactly that. And we appreciate it. It's a good thing. It's a great thing. Oregon's growing at a phenomenal pace for this reason, probably many others as well. However many customers want to have their data close to their users. Many customers want to have their computation close to their data. There are many, many reasons to put data centers in different places. And it's just the fact that some of the places where customers want to put data today, aren't the cleanest power source locations in the entire world. So we're signing up for a challenge because we got to make it right everywhere. And we're not going to do that by, by removing choice from customers. We're going to do both because that's the way it is in Amazon. And so 25% last year, we signed up for 40% this year, 40% an interesting number, because let's just say we doubled capacity in that period. It's a little, it's been a little bit less than doubled, but let's just say we double capacity in that period, the team, the, the, the AWS power team, the energy team has signed up to deliver 40%. If they double the amount of renewable energy during that period, while we doubled the capacity, they'd be at 25%. So it's a kind of a tough one. And so at the same time, we're growing in the way that I showed you, the team has to catch that and beat that and hit 40%. And we love him. Now, his team is doing an awesome job. Every discussion I have with them, I find I learned something they're always active debates. We are going to hit 45%. The end of this year, not bad, Really happy Today. We're announcing, we're feeling good, where we really think we've got some good systems in place, and we know how to do this. We've announced, we're announcing today. We're going to hit 50% at the end of next year. So if customers want to be 50% green, just go to the cloud, simple as that. And we're tracking to, towards our goal. And we're happy about the progress. How do you do it? Like, let's make it real for me, hard work. The teams delivered a lot of basically the teams delivered a lot of green power right there. You know, instantly what we're looking at. We've got 150 megawatts, 80 megawatts. Now there another 208 megawatts hundred megawatts hundred 99 megawatts, almost 200 megawatts, and then a full another 180 megawatts. That's a lot of power. So you, so some that altogether that says that AWS is bringing projects are bringing online 907 megawatts of new renewable generation, 970 megawatts. Jennifer and I took our boat through the Columbia river up the Columbia river a couple of years ago. One of those, one of the interesting parts of that trip is we went through the Bonneville dam. Bonneville dam is unbelievable piece of engineering. It is astounding. How big of a piece of engineering it is and its capacity I believe is 1,300 megawatts. So we're in that ballpark. It's, it's very close to the Bonneville dam. Phenomenal. Okay. If you're like me, the first question you're going to ask is, wait a second, wait a second. Power's not always blowing when you know, it's not it's, win's not always blowing Sunstone, always shining. What are we going to get? If you take into account environmental factors for all of those sites that I told you, we're going to deliver 2.6 million megawatt hours annually of renewable energy. We're super happy, but what's been done so far. And the reason we're willing to sign up for the goal that I told you for 2015 is because the team's doing good work. Listen, where I am super happy to have a chance to take you through a little bit more detail on what we're doing, appreciate your time. And I hope you have a really good rest of your week at, at reinvent. Hope you learn a lot. Hope. I hope it's worthwhile. Thank you very much..