Huang Renxun's Latest Podcast: Can NVIDIA's Moat Endure?

By: blockbeats|2026/04/17 18:00:03

TNSR

note-io

Video Title: Jensen Huang: - Will Nvidia's moat persist?
Video Author: Dwarkesh Patel
Translation: Peggy, BlockBeats

Editor's Note: While the outside world is still debating whether "Nvidia's moat comes from the supply chain," this conversation argues that what is truly difficult to replicate is not the chip itself, but the entire system capability of "electrifying into tokens" - from computing architecture, software system to the collaborative operation of the developer ecosystem.

This article is compiled from the conversation between Dwarkesh Patel and Jensen Huang. Dwarkesh Patel is one of the most watched tech podcast hosts in Silicon Valley, hosting the YouTube channel Dwarkesh Podcast, specializing in in-depth research interviews, engaging in long-term dialogues with AI researchers and core figures of the tech industry.

Huang Renxun's Latest Podcast: Can NVIDIA's Moat Endure?

On the right is Dwarkesh Patel, on the left is Jensen Huang

Around this core, this conversation can be understood from three perspectives.

First, there is the change in technology and industry structure.
Nvidia's advantage lies not only in hardware performance, but in the developer ecosystem carried by CUDA, and the path dependency formed around the computing stack. In this system, computing power is no longer the only variable, and algorithms, system engineering, networking, and energy efficiency together determine the pace of AI advancement. This also leads to an important judgment: software will not be simply "commoditized" by AI; on the contrary, with the proliferation of agents, tool invocations will grow exponentially, further amplifying the value of software.

Second, there are the boundaries of business and strategic choices.
Facing the continuously expanding AI industry chain, Nvidia chooses to "do what is necessary, but not do it all." It does not enter cloud computing, nor does it engage in excessive vertical integration, but rather amplifies the overall market size through investment and ecosystem support. This restraint allows it to maintain critical control while avoiding becoming an ecosystem's substitute, thus bringing more participants into its technology system.

Third, there is a discrepancy in technology diffusion and the industry landscape.
The most tense part of the conversation is not in the specific conclusions, but in how to understand "risk" itself. One viewpoint emphasizes the first-mover advantage brought about by computing power leadership, while another focuses more on the long-term attribution of ecosystems and standards in the technology diffusion process. Rather than the short-term capability gap, the more critical question may be: which technological system will future AI models and developers operate on.

In other words, the endgame of this competition is not just "who can build a more powerful model first," but "who defines the infrastructure on which the model runs."

In this sense, NVIDIA's role is no longer just that of a chip company but closer to being the "underlying operating system provider" of the AI era—it seeks to ensure that no matter how computational power proliferates, the path to value creation still revolves around itself.

The following is the original content (reorganized for ease of reading comprehension):

TL;DR

· NVIDIA's moat lies not in "chips" but in the "full-stack system capability from electrons to Tokens." The core is not hardware performance but the ability to convert computation into value through a full-stack approach (architecture + software + ecosystem).

· The essential advantage of CUDA is not the tool itself but the world's largest AI developer ecosystem. Developers, frameworks, and models are all bound to the same technology stack, forming an irreplaceable path dependence.

· The key to AI competition is not just computing power but the combination of "computational stack × algorithms × system engineering." Improvements in architecture, networking, energy efficiency, and software collaboration far exceed the progress of mere process technology.

· The compute bottleneck is a short-term issue, and supply will be replenished being driven by demand signals within 2–3 years. The real long-term constraint is not the chip but energy and infrastructure.

· AI software will not be commoditized; instead, it will experience exponential growth in tool usage due to Agent explosion. The future is not cheaper software but an exponential increase in software invocation.

· NVIDIA's core strategy is not to venture into the cloud: do "everything necessary" but not swallow the entire value chain. Through investment and ecosystem support rather than vertical integration, NVIDIA amplifies the overall market size.

· The real strategic risk is not competitors gaining computing power but the global AI ecosystem no longer being based on the American technology stack. Once models and developers migrate, long-term technical standards and industrial dominance will shift accordingly.

Interview Content

Where Does NVIDIA's Moat Lie: in the supply chain or in the control of "electrons to Tokens"?

Dwarkesh Patel (Host):

We have seen many software companies' valuations decline because it is expected that AI will turn software into a standardized commodity. There is another somewhat naive understanding that goes something like this: you see, from the design files (GDS2) handed over to TSMC, TSMC is responsible for manufacturing the logic chip, wafer fabrication, building the switch circuits, then packaging with HBM produced by SK Hynix, Micron, Samsung, and finally sent to ODM for assembly into a complete machine frame.

Note: HBM (High Bandwidth Memory) is an advanced memory technology designed specifically for high-performance computing and AI; ODM (Original Design Manufacturer) refers to a contract manufacturer responsible for both production and product design.

So, from this perspective, NVIDIA is essentially doing software, while the manufacturing is done by others. If the software is commoditized, then NVIDIA will also be commoditized.

Jensen Huang (NVIDIA CEO):
But ultimately, there has to be a process to convert electrons into tokens. From electrons to tokens, and making these tokens more valuable over time, I think this transformation is hard to fully commoditize.

The transformation from electrons to tokens is itself a very extraordinary process. And making one token more valuable, like making one molecule more valuable than another, is making one token more valuable than another token.

In this process, there is a great deal of art, engineering, science, and invention involved to give value to this token.

Clearly, we are witnessing all of this happening in real time. So, this transformation process, manufacturing process, and the various signals involved have not been fully understood, and this journey is far from over. So I don't think that scenario will happen.

Of course, we will make it more efficient. In fact, the way you just described the issue is actually a mental model I have of NVIDIA: the input is electrons, the output is tokens, and NVIDIA is in between.

Our work is to "do as much of what's necessary and as little of what's unnecessary as possible" to achieve this transformation and give it extremely high capability.

When I say "as little as possible," I mean that for anything we don't need to do ourselves, we will collaborate with others and incorporate it into our ecosystem. If you look at NVIDIA today, we may have one of the largest partner ecosystems in both the upstream and downstream supply chains. From computer manufacturers, application developers, to model developers—you can see AI as a "five-layer cake," and we have an ecosystem layout at these five levels.

So we try to do as little as possible, but the part we must do is actually extremely difficult. And I don't think that part will be commoditized.

In fact, I also don't think that enterprise software companies are fundamentally in the business of "tool-making." However, the reality is that most software companies today are indeed tool providers.
Of course, there are exceptions; some are coding and solidifying workflow systems, but many companies are fundamentally tool companies.

For example, Excel is a tool, PowerPoint is a tool, what Cadence does is a tool, and Synopsys is also a tool.

Jensen Huang:
And the trend I see is actually contrary to the views of many people. I believe the number of agents will grow exponentially, and the number of tool users will also grow exponentially.

The number of instances calling various tools is also likely to surge. For example, the usage instances of Synopsys Design Compiler may significantly increase.
There will be a large number of agents using floor planners, layout tools, and design rule check tools.

Today, we are limited by the number of engineers; but tomorrow, these engineers will be supported by a large number of agents, and we will explore the design space in unprecedented ways. When you start using these tools today, this change will be very apparent.

The use of tools will drive these software companies to achieve explosive growth. This explosive growth hasn't happened yet because the current agents are not yet adept at using the tools.

So, either these companies build agents themselves, or the agents themselves become strong enough to use these tools. I believe the ultimate outcome will be a combination of both.

Dwarkesh Patel
I remember in your most recent disclosure, you had close to $100 billion in procurement commitments for boundary components, memory, packaging, etc. And SemiAnalysis report suggests that this figure could reach $250 billion.

One interpretation is that NVIDIA's moat lies in you locking in the supply of these scarce components for the coming years. In other words, can others make accelerators too, but can they get enough memory? Can they get enough logic chips?

Is this NVIDIA's core advantage in the coming years?

Jensen Huang:
This is something we can do but is very difficult for others to do. The reason we can make such massive commitments upstream is partly explicit, as in the procurement commitments you mentioned; and partly implicit.

For example, a lot of the upstream investment is actually done by our supply chain partners, because I would say to their CEO: Let me tell you how big this industry is going to be, let me explain why, let me deduce with you, let me tell you what I see.

Through this process—transmitting information, inspiring a vision, building consensus—I align with CEOs from different industries upstream, and only then are they willing to make these investments.

So why are they willing to invest in me and not others? Because they know I have the ability to buy their capacity and digest it through my downstream. It is precisely because of NVIDIA's downstream demand and supply chain scale that they are willing to invest upstream.

Look at GTC, the scale of the conference has amazed many people. It is essentially a 360-degree AI universe that brings the entire industry together. Everyone gathers because they need to see each other. I bring them together to let the upstream see the downstream, the downstream see the upstream, and at the same time let everyone see the progress of AI.

More importantly, they can engage with AI-native companies and startups, see various innovations happening firsthand, and thus validate those judgments I have made.

So I have spent a lot of time, directly or indirectly, explaining the current opportunities to our supply chain and ecosystem partners. Many people would say that my keynote is not like a traditional product announcement one after another at a conference, but has a part that sounds like "teaching." And this is actually my purpose.

I need to ensure that the entire supply chain—whether upstream or downstream—understands: what is going to happen next, why it will happen, when it will happen, how big the scale will be, and be able to systematically reason through these questions like I do.

So the "moat" you just mentioned does indeed exist. If this market reaches a trillion-dollar scale in the coming years, we have the ability to build the supply chain to support it. Like cash flow, the supply chain also has flow and turnover. If a business architecture's turnover is not fast enough, no one will build a supply chain for it. The reason we can sustain this scale is that the downstream demand is extremely strong, and everyone can see that.

It is precisely this point that allows us to do these things at the scale we are at now.

Dwarkesh Patel
I still want to better understand whether the upstream can keep up. Over the past many years, your revenue has basically doubled year over year, and the compute capacity you provide to the world has even tripled.

Jensen Huang:
And it continues to double at this scale.

Dwarkesh Patel:
Exactly. So if you look at logic chips, like you are one of TSMC's biggest customers on the N3 process, also a major customer on the N2.
According to some analysis, this year AI may account for 60% of N3 capacity, and next year it may even reach 86%.

Note: N3 refers to TSMC's 3-nanometer (3nm) process node, which can be understood as one of TSMC's most advanced chip manufacturing processes

So, given that you already occupy such a large share, how can you continue to double? And double every year at that? Have we entered a phase where the growth of AI computing power must slow down due to upstream constraints? Is there a way to bypass these limitations? How can we possibly build two wafer fabs every year?

Jensen Huang:
At certain times, instantaneous demand does indeed exceed the entire industry's supply, both upstream and downstream. And in certain cases, we may even be constrained by the number of plumbers—this has actually happened.

Dwarkesh Patel:
So, next year's GTC should invite plumbers.

Jensen Huang:
Yes, it's actually a good phenomenon. You want to be in a market like this: where instantaneous demand is greater than the total industry supply. Conversely, of course, it's not so good.

If the gap between the two is too large, a specific link, a certain component becomes a clear bottleneck, and the entire industry will rush to solve it. For example, I noticed that people are not talking much about CoWoS now. The reason is that over the past two years, we have made a huge investment and expansion in it, multiplying it several times.

Now I think the overall situation is quite good. TSMC has also realized that the supply of CoWoS must keep pace with the growing demand for logic chips and memory. So they are expanding CoWoS while also expanding future advanced packaging technologies, and they are expanding at the same pace as logic chips.

This is very important because in the past, CoWoS and HBM memory were more like "special capabilities," but not anymore. Now everyone has realized that they are part of mainstream computing technology.

At the same time, we now have the ability to influence a broader supply chain. In the past, when the AI revolution was just beginning, what I am talking about now, I was actually talking about five years ago.

Some people believed and invested at that time, such as Micron's Sanjay team. I still remember that meeting vividly, where I clearly explained what would happen in the future, why it would happen, and predicted the results we see today. At that time, they chose to significantly increase their investment, and we also established a partnership with them. They made investments in various directions such as LPDDR and HBM, which obviously brought them significant returns. Some companies followed later, but now everyone has entered this stage.

So I believe that each generation of technology, each bottleneck, will receive a lot of attention. And now, we have been "prefetching" these bottlenecks several years in advance. For example, our collaboration with Lumentum, Coherent, and the entire silicon photonics ecosystem. Over the past few years, we have actually reshaped the entire ecosystem and supply chain.

In the field of silicon photonics, we have built a complete supply chain around TSMC, collaborated with them to develop technology, invented many new technologies, and licensed these patents to the supply chain, maintaining the openness of the ecosystem. We prepared the supply chain by inventing new technologies, new workflows, new testing equipment (including double-sided detection), investing in related companies, and helping them scale up.

So you can see that we are actively shaping this ecosystem to enable the supply chain to support future scale.

Dwarkesh Patel:
It sounds like some bottlenecks are easier to solve than others. For example, compared to expanding CoWoS, there are those that are more difficult

Jensen Huang:
Actually, what I just mentioned is the hardest one.

Dwarkesh Patel:
Which one?

Jensen Huang:
Plumber. Yes, really. What I mentioned earlier is the hardest one — plumbers and electricians. The reason is, this also makes me a bit concerned about some "doomsayers" who are always talking about jobs disappearing, positions being replaced. If we advise people not to become software engineers because of this, then we will really lack software engineers in the future.

Similar predictions were made ten years ago. At that time, some said, "Whatever you do, do not become a radiologist." You can still find those videos online, saying that radiology would be the first profession to be eliminated, and the world would no longer need radiologists. But the reality is, we now lack radiologists.

Dwarkesh Patel:
Okay, back to the previous question: Some links can be expanded, some cannot. So, specifically, how can the production capacity of logic chips double? After all, the real bottleneck is here, both memory and logic are limiting factors. What about EUV lithography machines? How do you manage to double their quantity every year?

Jensen Huang:
All of these are not undoable. Indeed, rapid scaling is not easy, but accomplishing these things in two to three years is not difficult, actually. The key is to have a clear demand signal. Once you can make one, you can make ten; once you can make ten, you can make a million. So, fundamentally, these things are not hard to replicate.

Dwarkesh Patel:
Would you then convey this judgment to the depth of the supply chain? For example, would you go to ASML and say: if I look three years down the road, in order to achieve NVIDIA's annual revenue of $2 trillion, do we need more EUV lithography machines?

Jensen Huang:
Some I would do directly, some are done indirectly. If I can convince TSMC, ASML will naturally be convinced as well. So, we must identify critical bottlenecks. But as long as TSMC believes in this trend, in a few years, you will have enough EUV equipment.

What I mean is, no bottleneck will last more than two to three years, none.

At the same time, we are also increasing computational efficiency. From Hopper to Blackwell, we have roughly achieved a 10x, 20x improvement, and in some cases, even 30x to 50x. We are also constantly introducing new algorithms. Because CUDA is flexible enough, we can develop various new methods to expand capacity while improving efficiency.

So, these things do not worry me. What truly worries me are external factors beyond our downstream, such as energy policy. Without energy, you cannot expand; without energy, you cannot establish an industry; without energy, you cannot build an entirely new manufacturing ecosystem.

Now, we want to drive reindustrialization in the United States, bring back chip manufacturing, computer manufacturing, and packaging, while establishing new industries like electric vehicles and robotics. When we are building an AI factory, all of these rely on energy, and the construction related to energy has a long cycle. In contrast, increasing chip capacity is a two to three-year issue; increasing CoWoS capacity is also a two to three-year issue.

Dwarkesh Patel:
Quite interesting. I feel some of the guests I have interviewed have given the exact opposite judgment. It's just on this issue, I indeed do not have enough technical background to judge.

Jensen Huang:
However, the good thing is, you are now talking to experts.

-- Price

Will Google's TPU Shake NVIDIA's Position?

Dwarkesh Patel:
Yes, indeed. I wanted to ask about your competitors. When we look at TPUs, it can be said that currently, two out of the top three global large models—Claude and Gemini—have been trained using TPUs. What does this mean for NVIDIA's future?

Note: TPU (Tensor Processing Unit) is a type of specialized chip designed by Google specifically for artificial intelligence, especially deep learning

Jensen Huang:
What we do is completely different. NVIDIA is building "accelerated computing," not Tensor Processing Units (TPUs).

Accelerated computing can be used for a variety of tasks, such as molecular dynamics, quantum chromodynamics, data processing, data frameworks, structured data, unstructured data, fluid dynamics, particle physics, and of course, AI. Therefore, the application scope of accelerated computing is much broader.

Although the current discussion is centered around AI, which is indeed very important and has a significant impact, the scope of "computing" itself is much broader than AI. What NVIDIA does is reinvent the computing approach from general-purpose computing to accelerated computing. Our market coverage is far wider than what any TPU or other specialized accelerator can achieve.

If you look at our positioning, we are the only company that can accelerate various types of applications. We have a vast ecosystem where various frameworks and algorithms can run on the NVIDIA platform. Moreover, our computer systems are designed to be "operated by others." Any operator can purchase our systems to use.

Most self-developed systems are not designed for use by others; you basically have to operate them yourself because they were not initially designed to be flexible enough for others to use. Because anyone can operate our systems, we have entered all major platforms, including Google, Amazon, Azure, OCI, and others.

Whether you are looking to rent computing power to operate systems or to use systems yourself, if you want to engage in a leasing business, you must have a large-scale customer ecosystem covering multiple industries to meet these needs. If you are operating systems for your own use, we certainly have the ability to help you do that. For example, Elon's xAI.

Because we enable operators from any industry or company to use our systems, you can use it to build supercomputers for companies like Lilly, for scientific research and drug discovery. We can help them operate their own supercomputers and apply them to various applications in drug research and the biological sciences, all of which are areas we can accelerate.

So we can cover a wide range of applications, which the TPU cannot do. NVIDIA's CUDA, which was built by NVIDIA, can also serve as an outstanding tensor processing platform, but it is not just that. It covers the entire lifecycle of data processing, computing, AI, and more. Therefore, our market opportunity is much larger, with a broader scope. And because we now support virtually all types of applications globally, you can deploy NVIDIA systems anywhere, and rest assured that there will definitely be customers using them.

So this is fundamentally a completely different thing.

Dwarkesh Patel:
This question will be a bit longer.

Your current revenue is amazing, and this revenue mainly comes not from pharmaceuticals or quantum computing. You are not earning $600 billion per quarter from these businesses, but because AI is an unprecedented technology that is advancing at an unprecedented rate.

So the question is: if we look only at AI, what is the optimal solution? I am not on the ground level, but I have talked to some AI researchers, and they would say: when I use a TPU, it is a large array, very suitable for matrix multiplication; while GPUs are more flexible, suitable for handling a large number of branches and irregular memory access.

But if you look at AI, isn't it essentially just repetitive, highly predictable matrix multiplication? Then you actually don't need to occupy chip area for features like warp scheduling, thread switching, memory bank, and so on. So TPUs are highly optimized for the current wave of computing power demand and revenue growth, focusing on the main application scenarios.

How do you see this viewpoint?

Jensen Huang:
Matrix multiplication is indeed an important part of AI, but it is not all of AI.

If you want to propose a new attention mechanism, or do calculations in a different way; if you want to design a completely new architecture, like a hybrid SSM; if you want to build a model that combines diffusion and autoregressive—you need a general-purpose programmable architecture, and we can run anything you can think of.

This is our advantage, making the invention of new algorithms much easier. It's because it's a programmable system, and constantly inventing new algorithms is the reason why AI can progress so rapidly.

TPU, like any other hardware, is also subject to Moore's Law. We know Moore's Law brings about a 25% improvement each year. So if you want to achieve a 10x, 100x leap, the only way is to fundamentally change the algorithm and its computation every year.

This is exactly NVIDIA's core strength.

The reason we were able to achieve a significant improvement with Blackwell compared to Hopper - I said it was 35 times back then - when I first announced that Blackwell's energy efficiency would be 35 times higher than Hopper, no one believed it.

Later, Dylan wrote an article saying that I was actually being conservative, and the actual improvement is closer to 50 times, and this kind of improvement cannot be achieved solely by Moore's Law. Our method of solving this problem is to introduce new model structures, such as MoE, and to parallelize, decouple, and distribute computation, extending it across the entire computing system. Without the ability to go deep into the hardware layer and develop new computing cores using CUDA, this would be very difficult to achieve.

Note: Referring to Dylan Patel, a well-known analyst in the semiconductor and AI infrastructure field, and founder of the research firm SemiAnalysis

So, our advantage lies in: the programmability of the architecture, and NVIDIA as a highly co-designed company. We can even offload some computation to the interconnect architecture, such as NVLink, or the networking layer, such as Spectrum-X. In other words, we can drive change simultaneously across the processor, system, interconnect, software libraries, and algorithms. All of this is happening at once. Without CUDA to support all of this, I wouldn't even know where to start.

Dwarkesh Patel:

This also raises a question about NVIDIA's customer base: If 60% of your revenue comes from these five hyperscalers, in another era, facing a different type of customer, like an experimenting professor, they heavily rely on CUDA. They can't use other accelerators, they can only use PyTorch + CUDA, and everything needs to be well optimized.

But if it's these large hyperscalers, they have the capability to write their own kernels. In fact, they must do so to squeeze out the last 5% of performance. Companies like Anthropic and Google often use custom accelerators or TPUs for training. Even OpenAI, when using GPUs, uses Triton. They would say: We need our own kernel. So they would directly write CUDA C++, instead of using libraries like cuBLAS, NCCL, and build their own software stack, and even compile it for other accelerators.

So, for the majority of your customers, they can indeed and are indeed replacing CUDA. How much, then, is CUDA still a key driver for cutting-edge AI that must rely on NVIDIA?

Jensen Huang:
First of all, CUDA is a very rich ecosystem. If you're developing on any computer, starting with CUDA is a very wise choice. Because this ecosystem is very rich, we support all mainstream frameworks.

If you need to write custom kernels, such as Triton, we have contributed a lot of NVIDIA technology to Triton's backend, and we are also very willing to help various frameworks become better. Now there are many frameworks, such as Triton, vLLM, SG Lang, and many more.

With the advancement of post-training and reinforcement learning, this field is expanding rapidly. For example, you have Vairal, NeMo RL, and a range of new frameworks. If you want to develop on a certain architecture, starting with CUDA is the most reasonable choice because you know the ecosystem is mature. When issues arise, it's more likely to be your own code problem rather than the underlying heap of code.

Don't forget, the codebase behind these systems is very large. When the system has issues, you want to know if the issue is in your code or in the computing platform itself.

You certainly hope the issue is in your own code and not in the computing platform. Of course, we have a lot of bugs ourselves, but our system is very mature, and you can continue to build on a reliable foundation at least.

The second point is the install base scale. If you are a developer, no matter what you are doing, the most important thing is the "install base." You want your software to run on as many computers as possible. You are not writing software for yourself; you are writing software for your entire cluster, and even for the entire industry because you are a framework developer.

NVIDIA's CUDA ecosystem is essentially our most important asset. There are now hundreds of millions of GPUs worldwide. All cloud providers have them, from V100, A100, H100, H200, to L series, P series, various specifications.

And they exist in various forms. If you are a robotics company, you would want CUDA to run directly on the robot body. We are virtually everywhere.

This means that once you have developed software or a model, it can be used anywhere. So the value of this install base itself is extremely significant.

The last point is the flexibility of deployment location. We exist in all cloud platforms, which gives us uniqueness. As an AI company or developer, you are not sure which cloud provider you will ultimately collaborate with, nor where your system will run. However, we can run everywhere, including on-premises deployment.

Therefore, the richness of the ecosystem, the scale of the install base, and the flexibility of deployment location, when combined, are very valuable.

Dwarkesh Patel:
That makes sense. But what I am curious about is whether these advantages are really that important to your key customers. Many people will indeed benefit from these advantages, but those who can build their software stack themselves—the group that contributes the majority of your revenue—especially in a world where AI is becoming stronger in a "verifiable feedback loop" task, such as in reinforcement learning scenarios, where kernel optimizations like attention or MLP are actually a very easily verifiable feedback loop.

So can these large-scale cloud providers write these kernels themselves? Of course, they may still choose NVIDIA for cost-effectiveness. But the question is, will this ultimately become a simple comparison: who can provide better specifications? For example, in terms of unit cost, who can provide higher computing power (FLOPs) and higher memory bandwidth? Because in the past, NVIDIA has had a very high profit margin (over 70%) at both the hardware and software levels, largely due to the CUDA moat.

So the question is, if most customers can build their software stacks themselves without relying on CUDA, can this profit margin be sustained?

Jensen Huang:
The number of engineers we've put into these AI labs is truly amazing, working with them, helping them optimize the entire technology stack. The reason is, nobody knows our architecture better than us. And these architectures are not as general-purpose as CPUs.

CPU is a bit like a "family car," you can think of it as a cruising car, not driving especially fast, but everyone can drive it well, with cruise control, everything is straightforward. But NVIDIA's GPU accelerator is more like an F1 race car. I can imagine everyone could drive it at 100 miles per hour, but to truly push it to the limit requires considerable expertise.

And we use a lot of AI to generate these kernels. I am very certain that for quite some time, we are still indispensable. Our expertise can help partners in these AI labs easily double their performance. Many times, after we optimize their tech stack or a certain kernel, their models can accelerate by 3 times, 2 times, or even 50%. This is a significant improvement, especially when you consider they have large Hopper and Blackwell clusters.

If you double the performance, it means your revenue doubles directly. This is directly correlated to revenue. NVIDIA's compute stack has the best global Total Cost of Ownership (TCO) performance, unmatched by any competitor. No company can prove to me which platform offers a better performance/TCO ratio than ours. Not a single one. And these benchmark tests are publicly available.

Dylan is right. Inference Max is public, anyone can use it. But no TPU team is willing to use it to showcase their inference cost advantage. It's hard to do, no one is willing to come out and prove it.

The same goes for MLPerf. I welcome them to demonstrate the 40% advantage they have always claimed. I would love to see them prove the TPU's cost advantage. To me, it doesn't make sense, it just doesn't add up. Not at all.

So I believe the fundamental reason for our success is that our TCO is excellent.

Another point, you mentioned that 60% of our customers come from the top five manufacturers, but most of that business is actually aimed at external customers. For example, on AWS, NVIDIA's computing power is mostly provided to external customers, not used by AWS itself. On Azure, our customers are also mostly external customers; the same goes for OCI. They choose us because our scope is very broad.

We can bring the world's best customers to them, and these customers themselves are built on the NVIDIA platform. And these companies are built on NVIDIA because our coverage and flexibility are very strong.

So I think this flywheel is working: the installed base, the programmability of the architecture, and the ongoing accumulation of the ecosystem. Plus, now there are thousands of AI companies worldwide. If you are one of the AI startups, what architecture would you choose? You would choose the most popular, the one with the largest installed base, and the richest ecosystem. That's the logic of this flywheel.

So the reasons are:

· First, our performance per dollar is very high, hence the lowest token cost;

· Second, our performance per watt is the highest in the world; if a partner builds a 1GW data center, it must output the most tokens, in other words, the most revenue. And our architecture can produce the most tokens per unit of power consumption.

·Third, if your goal is to rent out computing power, we have the most customers globally.

That's why this flywheel was created.

Dwarkesh Patel:
Very interesting. I think the crux of the issue is what the market structure really looks like. Even with many companies, a scenario could arise: there are thousands of AI companies, all roughly sharing computing power.

But if the reality is that through these hyperscale cloud providers, the ones actually utilizing computing power are foundational model companies like Anthropic and OpenAI, and they have the ability to get different accelerators running.

Jensen Huang:
I think your premise is wrong.

Dwarkesh Patel:
Maybe. Let me rephrase that. If these claims about performance and cost are true, why did companies like Anthropic, which just announced a multi-exajoule TPU collaboration with Broadcom and Google a few days ago, do so? And most of their computing power comes from these systems. For Google, TPU is the primary source of computing power. So, looking at these large AI companies, they used to be all NVIDIA, but that's not the case anymore.

If these advantages theoretically hold, why would they still choose other accelerators?

Jensen Huang:
Anthropic is quite a special case. If there were no Anthropic, TPU's growth would hardly exist. TPU's growth almost entirely comes from Anthropic. Likewise, if there were no Anthropic, the growth in training demand would almost not exist.

That's a very clear fact. There aren't numerous similar opportunities; in reality, there is only one Anthropic.

Dwarkesh Patel:
But OpenAI also collaborates with AMD, and they are developing their own Titan accelerator.

Note: AMD (Advanced Micro Devices) is a U.S. semiconductor company that mainly designs computing chips and is a key competitor to NVIDIA and Intel

Jensen Huang:
But the vast majority of them still use NVIDIA. We will continue to collaborate extensively. I don’t get upset when others try other solutions. If they don't try other solutions, how would they know how good our solution is?

Sometimes it is necessary to reaffirm this through comparison. And we must also constantly prove that we deserve our current position.

There have always been various claims in the market. You can see how many ASIC projects have been canceled. Just because you start doing ASICs doesn't mean you can create something better than NVIDIA.

In fact, it is not easy. It can even be said that rationally, it does not hold up well. Unless NVIDIA has really made a serious mistake in some aspects. But considering our scale, our pace — we are the only company globally that achieves significant leaps every year.

Dwarkesh Patel:
Their logic is: You don't need to be better than NVIDIA, you just shouldn't be 70% worse than NVIDIA, because they think your profit margin is 70%.

Jensen Huang:
But don't forget, even with ASICs, the profit margin is actually very high. NVIDIA's profit margin is about 60%–70%, and ASICs' profit margin could also be around 65%. So how much have you really saved?

You always have to pay someone. So from what I have seen, the profit margin of these foundational (ASIC) businesses is actually very high, and they also believe so themselves and are quite proud of it.

In the past, we actually didn't have the capability to do this. And to be honest, at the time I didn't really understand deeply how difficult it is to build a foundational model lab like OpenAI or Anthropic. Nor did I fully realize that they actually need massive investment support from the supply side.

At that time, we didn't have the capability to make billion-dollar investments, like investing in Anthropic to have them use our computing power. But Google and AWS could, they put in huge sums of money from the beginning, and in return, Anthropic uses their computing power.

We didn't have the ability to do that back then, and I have to say, it was my mistake: I didn't fully realize that they actually had no other choice. Venture capital firms cannot invest $5 billion or $10 billion to support an AI lab and expect it to grow into Anthropic.

That was my misjudgment. But even if I had realized it back then, I don't think we had the ability at that stage to do it.

However, I won't make the same mistake again. I am happy to invest in OpenAI, and I am also happy to help them expand, I think it is necessary. When Anthropic later approached us, I was also happy to become an investor and help them grow.

It was just at that time, we really couldn't do it. If we could start over, if Nvidia was already as powerful as it is now back then, I would be very willing to do those things.

Why Doesn't Nvidia Do "Cloud"?

Dwarkesh Patel:
This is very interesting. Over the years, Nvidia has always been a company that "sells shovels to make money" in the AI field, and has made a lot of money. And now you are starting to invest this money. There are reports that you have invested $30 billion in OpenAI, $10 billion in Anthropic. And the valuations of these companies continue to rise.

So, looking back at the past few years, you gave them computing power, saw the trends, and at the time their valuations were only a tenth of what they are now, or even far lower than they are now just a year ago. And you had a lot of cash at that time.

There is actually a possibility: Nvidia could have become a fundamental model company itself, or invested on a large scale earlier at a lower valuation, similar to what you are doing now.

So I am really curious, why didn't you do this earlier?

Jensen Huang:
We did it the moment we "could." If we could have done it back then, I would have done it earlier. When Anthropic needed our support at the beginning, I would have done it. But at that time, we really didn't have the capability.

It was beyond our capabilities and beyond our decision-making habits.

Dwarkesh Patel:
Was it a funding issue, or?

Jensen Huang:
Yes, it was a matter of investment scale. We had almost no tradition of external investment at that time, let alone investment of that scale. And we didn't realize it was necessary.

My thought at the time was, they could go find venture capital, just like any other company. But what they wanted to do was actually beyond what venture capital could support. What OpenAI wanted to do was also something that venture capital couldn't support.

That was something I later realized. But that's where they were smart. They realized at that time that they had to go down that path. I'm glad they did. Even though we couldn't participate at the time, which led Anthropic to turn to other partners, I still think it's a good thing. The existence of Anthropic is a good thing for the whole world, and I'm happy about that. Some regrets are acceptable.

Dwarkesh Patel:
So the question will still come back to one point: Now that you have so much cash on hand and it continues to grow, how should NVIDIA use this funding?

One idea is that there is now an intermediary ecosystem helping these AI labs convert capital expenditure (capex) into operational expenditure (opex) so they can lease computing power.

Because GPUs are expensive, but as models advance, they can continuously generate higher-value tokens throughout their lifecycle. And NVIDIA itself has the ability to bear these upfront capital expenditures. For example, there are reports that you have provided up to $6.3 billion in support to CoreWeave and invested $2 billion.

So why doesn't NVIDIA become a cloud provider itself? Why not become a hyperscaler, build its own cloud, and rent out computing power? After all, you have the cash capability.

Jensen Huang:
It's a philosophical question for the company, and I think it's a wise philosophy: we should do "as much as necessary and as little as possible."

This means that when it comes to building a computing platform, if we don't do it, I truly believe it wouldn't get done.

If we don't take on these risks, don't build NVLink, don't build the entire software stack, don't create this ecosystem, don't invest 20 years doing CUDA (most of which was even losing money), if we don't do these things, no one else will. If we don't build these domain-specific libraries of CUDA-X—whether it's ray tracing, image generation, or early AI models, data processing, structured data, vector data processing—if we don't do it, these things won't exist.

I am completely convinced of this. We even developed a library for computational lithography called cuLitho, if we don't do it, no one else will.

So, the reason accelerated computing has developed to the extent it has today is because we did these things. That's the part we should be fully committed to doing.

But at the same time, there are already many cloud providers in the world. Even if we don't do it, someone else will. So based on the principle of doing as much necessary as possible, but as little as possible of other things, this concept has always been present in the company. Every decision I make is viewed from this perspective.

In the cloud space, if we didn't support CoreWeave initially, these new AI clouds (neocloud) might not exist. If we didn't support them, they wouldn't have developed to the scale they are today. The same goes for Nscale, Nebius, if it weren't for our support, they wouldn't have come this far. And now, they have all developed quite well.

But is this a business that we should personally involve ourselves in? No. We still adhere to that principle: do what is necessary, and do as little as possible beyond that. So we will invest in the ecosystem because I want the entire ecosystem to thrive. I want our architecture to connect as many industries and as many countries as possible, enabling AI to be developed globally and built on a tech stack based in the United States.

This is the vision we are advancing.

At the same time, as you mentioned, there are many excellent foundational model companies now, and we will try to invest in them as much as possible.

Another point is that we will not "pick winners." We want to support everyone. This is both a business necessity and something we are willing to do. So when I invest in one company, I will also invest in others.

Dwarkesh Patel:
So why do you purposely avoid picking winners?

Jensen Huang:
Because that is not our responsibility. That's the first point.

Second, when NVIDIA was first founded, there were about 60 graphics companies, 60 companies doing 3D graphics. In the end, only we survived. If you were to pick one of those 60 companies back then to succeed, NVIDIA was likely the least likely to succeed.

That was before your time, but at that time, NVIDIA's graphics architecture was completely wrong. Not a little off, but fundamentally wrong. We designed an architecture that developers could hardly support, which was doomed to fail. We deduced it from very reasonable first principles, but ended up with the wrong solution.

Everyone thought we couldn't succeed, but we still survived in the end. So I have enough humility to admit this and not pick winners. Either let them develop on their own, or support everyone.

Dwarkesh Patel:
There's one point I didn't quite understand. You said you are not deliberately prioritizing support for these new cloud vendors, but you also just mentioned that without NVIDIA, they might not exist. How do these two points coexist?

Jensen Huang:
First, they must want to exist on their own and actively seek our help. When they have a clear intention, a business plan, capability, and passion—of course, they must also have a certain level of ability—if they need some investment support in the initial stage, we will be there.

But the key is for them to quickly establish their own flywheel. Your question just now was, do we want to get into the financing business? The answer is no. We don't want to become a financial institution. There are already many people in the market doing financing, and we prefer to cooperate with these financial institutions rather than do financing ourselves.

So our goal is to focus on our own business, keep the business model as simple as possible, and at the same time, support the entire ecosystem.

Jensen Huang:
When a company like OpenAI needs a $30 billion investment before IPO, and we believe in them very much—I personally believe they are already an extraordinary company and will become an even more remarkable company. The world needs them to exist, everyone hopes they exist, and I hope they exist too. They have all the elements to be successful, so we support them and help them expand.

Therefore, we will make this type of investment because they do need us to do so. But our principle is not "do as much as possible," but "do as little as possible."

Dwarkesh Patel:
This question may seem a bit obvious, but for many years, we have been in a state of GPU shortage, and as models become more powerful, this situation becomes more pronounced.

Jensen Huang:
Yes, we do have a GPU shortage.

Dwarkesh Patel:
And NVIDIA is considered not to simply distribute these scarce resources based on the highest bid, but to consider things like ensuring the existence of these new cloud providers—giving some to CoreWeave, some to Crusoe, some to Lambda.

First, do you agree with this view? Second, what benefits does this bring to NVIDIA?

Jensen Huang:
I think your premise is wrong. Of course, we will very carefully consider these matters.

First, if you don't have a Purchase Order (PO), no amount of communication matters. So first and foremost, we will work hard with all customers to forecast demand because the production cycle of these products is very long, and the data center construction period is also very long. We align supply and demand through forecasting, which is the first thing.

Second, we will forecast with as many customers as possible. But in the end, you still have to place an order. If you don’t place an order, then I can't do anything. So at some point, it's "first come, first served."

However, apart from that, if your data center is not ready yet, or if certain key components are not ready, causing you to be temporarily unable to deploy the system, we may prioritize serving other customers. This is just to maximize the overall throughput efficiency of our factories.

In addition to this scenario, the priority rule is "first come, first served." You must place an order. If you don't place an order, there is simply no way.

Of course, there are many stories out there. For example, some say that during a dinner with Larry, Elon, and me, they requested a GPU — we did indeed have dinner together, which was a very enjoyable evening, but they never "requested" a GPU. They only needed to place an order. Once an order is placed, we will do our utmost to provide capacity. It's not as complicated as some make it out to be.

Dwarkesh Patel:
So it sounds like a queuing mechanism, depending on when you place the order and whether the data center is ready. But this still isn't simply "highest bidder wins," right?

Jensen Huang:
We never do that.

Dwarkesh Patel:
Never allocate based on the highest bid?

Jensen Huang:
Never. Because that is a terrible business practice.

You set the price, and the customer decides whether to buy. I know some companies in the industry raise prices when demand surges, but we don't. It has never been our practice. Customers can rely on us. I prefer to be a reliable presence, a cornerstone of the industry. You don't need to worry about price changes.

If I give you a quote, that is the final price. Even if demand skyrockets, it won't change.

Dwarkesh Patel:
So, is this also one of the reasons for your stable relationship with TSMC, right?

Jensen Huang:
NVIDIA and TSMC have been collaborating for nearly 30 years. There isn't even a formal legal contract between NVIDIA and TSMC; it's more of a rough understanding. Sometimes I am right, sometimes I am wrong; sometimes I get better terms, sometimes not so good terms. But overall, this relationship is remarkable. I can fully trust them and rely on them.

Moreover, for NVIDIA, one thing you can count on: this year, Rubin will be outstanding, next year, Vera Rubin Ultra will be launched, the following year Feynman will be launched, then the year after that — the name of which I haven't disclosed yet. In other words, every year, you can trust us. You have to find another ASIC team worldwide and see if any can make you say, "I can bet the whole company on you, trust that you will support me every year."

My token cost will decrease by an order of magnitude each year, and I can trust this like trusting a clock. I just said something similar about TSMC. No wafer fab in history has ever let you say this.

But today, you can say this about NVIDIA. You can trust us year after year.

If you want to buy $1 billion of AI factory compute, no problem; if you want to buy $100 million, also no problem; if you want to buy $10 million, or even just one rack, no problem; even if you only want to buy one GPU, no problem. If you want to place a $1 trillion order for an AI factory next, also no problem.

Today, we are the only company in the world that can say this. And I can also say this to TSMC: I want to buy $1 billion, no problem. We just need to plan together, go through the process, do those things that a mature company would do.

So, I believe that NVIDIA can become the foundation of the global AI industry, a position we have spent decades reaching. There is a huge investment and focus in this, and the stability and consistency of the company are very important.

Why NVIDIA Rejects "Multi-Roadmap Bet"

Dwarkesh Patel:
This actually leads to a very interesting question. We previously talked about TSMC, memory bottlenecks, and so on. Now, if we enter a world like this: you have taken up most of the N3 capacity, and in the future, you may also take up most of the N2 capacity. Would you consider going back to use idle capacity of older process nodes like 7nm?

For example, if the demand for AI is too high, and the ramp-up of the most advanced process node cannot keep up, then you can leverage all the experience today about numerical optimization and system design to create a new version of Hopper or Ampere. Do you think this situation will arise before 2030?

Jensen Huang:
There is no need for that. The reason is that the advancement of each architectural generation is not just a change in transistor size. You have also done a lot of engineering work on packaging, stacking, numerical systems, and system architecture. By the time you get to this point, going back to make an old node version would require an R&D investment that no one can afford. We can afford to keep moving forward, but I don't think we can afford to go back.

Of course, if we do a thought experiment: suppose one day everyone says that advanced capacity can never increase again. Would I immediately go back to using 7nm? Of course, without a doubt.

Dwarkesh Patel:
I previously had a discussion with someone about a question: why doesn't NVIDIA simultaneously drive multiple completely different chip projects? For example, you could do one like Cerebras' wafer-scale architecture, one like Dojo's large packaging, and one that doesn't rely on CUDA.

You have the resources and engineering talent to do these things in parallel. Since no one knows for sure where AI or architecture is headed in the future, why put all your eggs in one basket?

Jensen Huang:
We certainly could do that. It's just that we haven't seen a better solution. We have simulated all these things, and they are likely inferior in our emulator. So we won't do it. What we are doing now is what we really want to do and what we believe is the most correct.

Of course, if the future workload itself undergoes a radical change—I'm not talking about algorithm changes, but if the workload really changes—then we might add other types of accelerators.

For example, recently we added Grok, and we will integrate Grok into the CUDA ecosystem. We are doing this now. This is because the value of tokens has become very high, so the same model, based on different response speeds, may correspond to different price tiers.

A few years ago, tokens were almost free, or so cheap they were almost free. But now, different customers have different requirements for tokens. And these customers themselves can make a lot of money from it. For example, for software engineers, if I can give them a faster-response token to make them more efficient than they are today, then I am willing to pay for it.

But this kind of market has only emerged recently. So I believe that now, for the first time, we truly have the ability to have the same model form different market tiers based on response time.

That's also why we decided to extend this Pareto frontier to create a "faster-response, but lower throughput" inference branch. Because in the past, high throughput was always the most important. But now we believe that in the future, there may be a type of high ASP (high unit price) token. Even if the throughput in the factory is lower, the unit price is enough to make up for it.

This is the reason we are doing this. But if we just talk about the architecture itself, I would say, if I had more money, I would invest more money into the existing architecture.

Dwarkesh Patel:
I find the idea of this "ultra-premium token" and the stratification of the inference market very interesting.

One last question. Assuming the deep learning revolution never happened, what would NVIDIA be doing today?

Jensen Huang:
Well, of course, gaming would still be a focus, but in addition, we would continue with accelerated computing. This has always been our path.

The fundamental premise of our company is that Moore's Law would slow down. General-purpose computing is great for many things, but not ideal for many computational tasks. So we combine the GPU architecture with the CPU to accelerate the CPU's workloads. Different code kernels, different algorithms can be offloaded to run on the GPU. This way, an application can be accelerated by 100 or 200 times.

Where would this be used? Well, in engineering, science, physics, data processing, computer graphics, image generation, and various other areas.

So even if AI did not exist today, NVIDIA would still be a very large company. The reason is quite fundamental: the ability to continue expanding general-purpose computing has essentially reached its limits. And one way to improve performance—a crucial way, not the only way—is to do domain-specific acceleration.

We initially entered into computer graphics, but there are many other areas. Such as various scientific computations, particle physics, fluid simulations, structured data processing, and so on—various types of algorithms that can benefit from CUDA.

So our mission has always been to bring accelerated computing to the world, to drive applications that general-purpose computing cannot achieve, or cannot scale to the necessary level of performance, forward to help breakthroughs in the scientific field. Some of our earliest applications were in molecular dynamics, seismic processing for energy exploration, and of course, image processing.

In all these areas, general-purpose computing was too inefficient by itself. So yes, if there was no AI, I would be sad. But precisely because of our progress in computing, we democratized deep learning. We enabled any researcher, any scientist, any student, anywhere to use a PC, or a GeForce GPU, to make remarkable scientific discoveries. And this fundamental commitment has never wavered, not one bit.

So if you look at GTC, you will find that a significant portion of the content is actually not related to AI at all. Whether it's computational lithography, quantum chemistry, or data processing, these are all important but unrelated to AI. I know AI is fascinating and very exciting.

However, there are still many people doing very important work that is unrelated to AI. Tensor is not their only mode of computation. And we want to help all these people.

Dwarkesh Patel:
Jensen, thank you very much.

Jensen Huang:
You're welcome, I really enjoyed this conversation.

[Original Article Link]