PodcastsScienceInterconnects

Interconnects

Nathan Lambert
Interconnects
Latest episode

149 episodes

  • Interconnects

    Some ideas for what comes next, May 2026

    26/05/2026 | 9 mins.
    As the years of AI progress go by, it’s been accompanied by a slowly rising tide of consequence. Models are getting more capable, how we work is changing quickly, economics of AI are becoming real, just as real-world risks come to the forefront. 2026 is the first year where I don’t think there’ll be any breaks from this. The hard part to prepare for is that there’s a good chance things just continue to ratchet up from here – more disruption, more surprises, more stakes.
    On my end, there’s been a growing list of topics that are very fateful to how I see the current state of AI, but I haven’t even gotten to write about them (at least not from all the angles I want to)! All of these are closely related to the implications of different models reaching new capability levels and how I use that to infer what may come next.
    1. Open models haven’t had their true agent moment like Opus 4.5
    The time gap between open and closed models is very often discussed, but the reality is that we have a nice time-gating that’s independent of debatable benchmarks – if open-weight models do or do not become super useful in agentic harnesses. The Opus 4.5 in Claude Code moment of December 2025 was so loud and obvious, that if open models hit this performance level for price points as low as $5/month, there will be an explosion in usage.
    Right now we are about 5-6 months in with no equivalent open model. I suspect the robustness of the best closed frontier models that I write about could make this moment take a good amount longer, say closer to 12+ months. In this time, Claude Code and Codex may seem like different categories of products. In the standard flurry of new, state-of-the-art open models from a variety of labs, benchmarks will definitely keep climbing, but the open-closed gap should become more interpretable as real-world use becomes the real litmus test.
    2. Gemini still doesn’t have a meaningful competitor for Claude Code and Codex
    The best exclamation point I can offer to reinforce my prediction that open models are further behind than the benchmarks claim is that even the mighty Google doesn’t have a clear competitor for Claude Code and Codex. I’m sure the Gemini team is pushing very hard on this.
    I still need to do a lot more testing on Gemini 3.5 Flash, but reading reviews makes it clear that it’s not a substitute for how I’m working today. It’s maybe not the Gemini team explicitly specializing for Google’s existing products (search, YouTube, etc.), but the model seems to suit them. If Google doesn’t have a powerful tool here soon, I don’t expect the open model labs to either. The open models are going to be used more for automated, enterprise agents and low-cost domains, rather than being the driving tool of modern knowledge work. This will feed directly into the economic engine of funding future models, where the agents like Claude Code and Codex are the current best path to massive AI revenue growth.
    I discussed how the current environment is quietly driving labs in China to specialize on AI Proem with Grace Shao and this is central to my expectations of open models specializing over the next few years instead of competing with OpenAI, Anthropic, and Google.
    Interconnects AI is a reader-supported publication. Consider becoming a subscriber.

    3. I don’t expect an open-weights Mythos this year
    While I don’t think Mythos is a general “god model” that will crush the competition in every domain, I do think it’s a remarkable technical achievement in software engineering and cybersecurity. Mythos is obviously a watershed moment for those fields. Having spoken to most of the Chinese labs – particularly those with the most prominent, large, open MoE models like Kimi, Z.ai, DeepSeek, and Qwen – I think they’re heavily resource limited and don’t have an immediate path to scaling up training processes like the big labs in the U.S. For the labs which are more corporate, which comes with more resources, such as Alibaba and Bytedance, they also have more conservative stances on safety and security.Mythos is a bellwether of the massive acceleration in training and research compute available to the largest American companies.
    Epoch AI recently had a nice piece on the compute available to various labs (~Google 25%, Meta 11%, OpenAI 11%, Anthropic 6%). All of these numbers are vastly higher than any Chinese lab.
    4. American open models are slowly gaining steam
    Nvidia with Nemotron, Google with Gemma, Arcee AI and others are slowly stabilizing the open model ecosystem in the U.S. There’s a lot that’s hard to measure here, especially in the rise of local agents like OpenClaw and Hermes, but there are adoption numbers of American models that we haven’t seen since Llama 3.Gemma 4’s models are all tying or outperforming the equivalently sized Qwen 3.5/3.6 models — where Qwen has for years now been the default open model at these sizes. These Qwen 3.5/3.6 models have been tricky to get working in a lot of post-training research, partially due to architecture/tooling and partially likely due to modeling (i.e. the model is not easy to finetune for some training decision). I’ve heard few complaints about Gemma, but it also could be because Gemma is not yet the researcher default.
    There's a simple reality that we've seen recently with models like GPT-OSS, Nemotron 3, and now Gemma 4, that if a model is in the right range of benchmarks and released by an American lab with a truly permissive license, it'll get a large amount of adoption (in this cycle, recall that Gemma 4 adopted the Apache 2.0 License, changing from one with use-case restrictions on earlier Gemmas). This early phase of American growth in open models is establishing key brands directly with developers. The consensus is that more neolabs like Reflection and Thinking Machines are likely to participate in this space, but being too patient will lose the time when new agentic workflows and enterprise relationships are built.
    5. Anthropic and OpenAI are just getting up to speed in model iterations
    I expect the rest of this year to be a ruthless competition between these two flagship companies. I’m at an interesting balance where I think GPT 5.5 is a bit smarter of a model and I love the Codex App, so I’m structuring much of my work to be possible there. At the same time, for a lot of writing-related and broader surface area tasks I really still love Claude. These models are rapidly changing how we work, I run Codex from my phone while doing other things, am setting up automated open model analysis jobs on the back of agents, and expect to be able to scale the research side of Interconnects widely.
    AI is beginning to drive companies to the two extremes in the scaling era. The biggest companies will be way bigger than ever, using resources and mass talent to have sustained progress at the frontier of raw AI capabilities. On the other side, tiny businesses like Interconnects thrive by using agents to refine, present, and sell niche expertise. The mass social job displacement that’ll come is going to reduce employability for various knowledge workers that don’t fit into either of these extremes for the raw technical side (big or small companies), while sustaining and maybe even amplifying careers that interface directly with humans (e.g. doctors) or other power structures with means to sustain themselves (law/government).
    6. More existing power structures will assert themselves on AI
    Just in the last few days while writing this, we had the Pope release an over 40,000 word document on where AI is going and China expand personnel movement restrictions on top AI researchers across industry. At the same time, the U.S. has designated Anthropic a supply chain risk and continues to use its models for national security. The list of news like this is only going to grow. Existing power structures are realizing there’s a finite time window for them to exert themselves in the AI dynamic — an intuition that could be mapped to influence going down as AI models get more powerful. This intuition is potentially dangerous, as it sets up meaningful conflict in who controls the technology (as I discussed with Dean Ball after the Anthropic-DoW spat).
    Next: Where technical becomes social
    These largely technical and power trends accelerating are going to put more pressure on the social and political anti-AI sentiments within the U.S. This is currently the most obvious barrier to continued AI development and beneficial diffusion. Reflecting on this, many people in the tech discourse get too focused on the details, where yes a lot of data-center-detractors are making genuinely wrong factual claims in defense of their position.
    The real position that a large swath of Americans has is that they have a voice in saying no to the current trend — by not granting permission to build data centers. This is a voice that they haven’t been granted by the tech industry that changed the face of the global economy and power structures in the last few decades.
    This is setting us up for a challenging year ahead for the industry. The labs are aggregating and concentrating talent to peak levels. There are few neutral messengers to communicate the reality of AI to the public. The frontier labs leadership is largely gearing up to IPO and stay ahead in the capabilities race. With the status quo, there are few actions to unwind this path toward social conflict.
    It takes individuals in the AI ecosystem to zag and go against the groupthink of needing to make your wealth today, of needing to be at a lab to do impactful work, and so on. I’m personally continuing to bet on this, by trying to make a vibrant and diverse open model ecosystem supported by clear, unbiased information. If you agree with this and have been watching from the sidelines, it’s a good time to get involved, before the situation spirals into something uncontrollable.


    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
  • Interconnects

    Notes from inside China's AI labs

    07/05/2026 | 16 mins.
    Staring out the window on a new, high-speed train from Hangzhou to Shanghai I’m gifted with views of dramatic ridgelines speckled with wind turbines that are silhouetted against the setting sun. The mountains cast a backdrop to a mix of spanning fields and clustered skyscrapers. I’m returning from China with great humility. It’s a very warming, human experience to go somewhere so foreign and be so welcomed. I had the honor of meeting so many people in the AI ecosystem who I knew from afar, and they greeted me with big smiles and cheer, reminding me how global my work and the AI ecosystem is.
    Interconnects AI is a reader-supported publication. Consider becoming a subscriber.

    The mentality of Chinese researchers
    The Chinese companies building language models are set up as the perfect fast-followers for the technology, building on long-standing cultural traditions in education and work, along with subtly different approaches to building technology companies. When you look at the outputs, the latest, biggest models enabling agentic workflows, and the ingredients, excellent scientists, large-scale data, and accelerated computing, the Chinese and American labs look largely similar. The lasting differences emerge in how these are organized and conditioned.
    I’ve long thought that a reason that the Chinese labs are so good at catching up and keeping up with the frontier is that they’re culturally aligned for this task, but without talking to people directly I felt like it wasn’t my place to attribute substantial influence to this hunch. Speaking with many wonderful, humble, and open scientists at the leading Chinese labs has crystallized a lot of my beliefs.
    So much of building the best LLMs today comes down to meticulous work across the entire stack, from data to architecture details and RL algorithm implementations. All points of the model can give some improvements, and fitting them in together is a complex process where the work of some brilliant individuals needs to get shelved in favor of the overall model maximizing a multi-objective optimization.
    Where American researchers are obviously also brilliant at solving the individual components, there’s more of a culture of speaking up for yourself in the U.S. As a scientist, you’re more successful when you speak up for your work and modern culture is pushing the new path to fame of “leading AI scientists”. This results in direct conflict. The Llama organization is heavily rumored to have collapsed under the political weight of these interests embedding themselves in a hierarchical organization. I’ve heard of other labs saying that it can be needed to pay off a top researcher to get them to stop complaining about their idea not making it in the final model. Whether or not that’s exactly true, the idea is clear. Ego and desires for career advancement do get in the way of making the best models. A small, directional shift in this sort of culture between the U.S. and China can have a meaningful impact on the final outputs.
    Some of this has to do with who is building the models in China. There’s an immediate reality at all of the labs that a large proportion of the core contributors are active students. The labs are quite young, and it reminds me of our setup at Ai2, where students are seen as peers and directly integrated in the LLM team. This is incredibly different from the top labs in the US, where the likes of OpenAI, Anthropic, Cursor, etc. simply don’t offer internships. Other companies like Google nominally have internships related to Gemini, but there’s a lot of concern about whether your internship will be siloed and away from anything real.
    To summarize how the slight change in culture can improve the ability to build models:
    * More willingness to do non-flashy work in order to improve the final model,
    * People new to building AI can be free of prior phases of AI hype cycles, allowing them to adapt to the new modern techniques faster (in fact, one of the Chinese scientists I talked to really actively attached to this strength),
    * Less ego enabling org charts to scale slightly, as there’s less gamifying the system, and
    * Abundant talent well-suited to solving problems with a proof of concept elsewhere, etc.
    This slight inclination towards skills that complement building today’s language models stands in contrast to a known stereotype that Chinese researchers tend to produce less creative, field-spawning, 0-to-1 academic style research. Among the more academic lab visits on our trip, many leaders talk about cultivating this more ambitious research culture. At the same time, some technical leaders we talked to were skeptical about whether such a rewiring in the approach to science is likely in the near term, because it’ll take a redesign of the education and incentive systems that is too big to happen within the current economic equilibrium. This culture seems to be training students and engineers that are excellent at the LLM building game. They also, of course, have an extremely abundant quantity.
    These students told me about a similar brain drain happening in China as in the U.S., where many who previously considered academic paths now intend to stay in industry. The funniest quote was from a researcher who was interested in being a professor to be close to the education system, but remarked that education is solved with LLMs – “why would a student talk to me!”
    The students have a benefit of coming at LLMs with fresh eyes. Over the last few years we’ve seen the key paradigm of LLMs shift from scaling MoE’s, to scaling RL, to enabling agents. Doing any of these well involves absorbing an insane amount of context quickly, both from the broader literature and the technical stack at your company. Students are used to doing this and excited to humbly drop all presumptions about what should work. They dive in head first and dedicate their life to getting the chance to improve the models.
    These students are also so magically direct and free of some of the philosophical chatter that can distract scientists. When asking questions on how they feel about the economics or long-term social risks of models, far fewer Chinese researchers have sophisticated opinions and a drive to influence this. Their role is to build the best model.
    This difference is subtle, and easy to deny, but it is best felt when having long conversations with an elegant, brilliant researcher who can clearly communicate well in English, basic questions on more philosophical aspects of AI hang in the air with a simple confusion. It’s a category error to them. One researcher even quoted the famous Dan Wang premise of China being run by engineers, relative to the lawyers of the U.S. when probing in these areas, to emphasize their desire to build. There’s no track in China that systematically enables the growth of star power for Chinese scientists, akin to mega mainstream podcasts like Dwarkesh or Lex.
    Trying to get Chinese scientists to comment on the coming economic uncertainty fueled by AI, questions beyond the capabilities of simple AGI, or moral debates on how models should behave all served to capture the upbringing and education of these scientists (edited). They are extremely dedicated to their work, but have grown up in a system where debates and opinions on how society should be structured and changed are not encouraged.
    Zooming out — Beijing especially felt much like the Bay Area, where a competitive lab is a short walk or Uber away. I got off a flight and stopped by Alibaba’s Beijing campus on the way to the hotel. Then, in 36 hours we went to all of Z.ai, Moonshot AI, Tsinghua University, Meituan, Xiaomi, and 01.ai. Travel by Didi is easy, and if you select an XL in China you’re often paired with electric mini vans that have massage chairs. We asked the researchers about the talent wars, and they said it’s very similar to what we’re experiencing in the U.S. It’s normal for researchers to bounce around, and much of where people choose to go is based on the best current vibes.
    In China, the LLM community feels far more like an ecosystem than battling tribes. Across many off the record conversations, it’s nothing but respect for peers. All of the Chinese labs fear Bytedance with their popular Doubao model, which is the only frontier closed lab in China. At the same time, all of the labs have massive respect for DeepSeek as the lab with the best research taste in execution. When you meet with lab members off the record in the States, sparks fly quickly.
    The most striking part of the humility of Chinese researchers is how they also often shrug on the business side, saying it’s not their problem, where everyone in the U.S. seems to be obsessed with various ecosystem-level industrial trends, from data sellers to compute or fundraising.
    Where China’s AI industry differs (and matches) the Western labs
    The thing that makes building an AI model today so interesting is that it’s not just about getting a group of great researchers in one building together to produce an engineering marvel. It used to be this, but to sustain AI businesses, the LLMs are becoming a mix of building, deploying, funding, and getting adoption for this creation. The leading AI companies exist in complex ecosystems that supply money, compute, data and more in order to keep pushing the frontier.
    The integration of these various inputs to creating and sustaining LLMs is fairly well conceptualized and mapped for the Western ecosystem, as typified by Anthropic and OpenAI, so finding big differences in how the Chinese labs think about it points at where the different companies can be making meaningfully different bets on the future. Of course, these futures can be heavily dictated by the constraints on funding and/or compute.
    I’ve documented the biggest “AI Industry” level take-aways from talking to these labs:
    * Early signs of domestic AI demand. There’s a much-touted hypothesis that the Chinese AI market will be smaller because Chinese companies don’t tend to pay for software – thus, never unlocking a giant inference market supporting labs. This is only true for software spend that maps to the SaaS ecosystem, which is historically tiny in China, where on the other hand there is obviously still a large cloud market in China. A crucial unanswered question – one which the Chinese labs themselves debate – on if spending for AI in the enterprise tracks the SaaS market (small) or the cloud market (fundamental). On net, it feels like AI is trending closer to the cloud, and no one was actively worried about a market growing around the new tools.
    * Most developers are Claude-pilled. Most of the AI developers in China are obsessed with Claude and how it’s changed how they build software, despite Claude nominally being banned in China. Just because China has historically been hesitant to buy software does not give me the impression that there won’t be a massive surge in inference demand. Chinese technical staff are so practical, humble, and motivated – a fact that seems stronger than any commitment to previous habits in not spending.Some Chinese researchers mention building with their own tools, such as the Kimi or GLM CLIs, but all of them mention building with Claude. There were also surprisingly few mentions of Codex, which is definitely surging in popularity in the Bay Area.
    * Chinese companies have a technology ownership mentality. The Chinese culture is combining with a roaring economic engine to create unpredictable outcomes. I’m left with a lasting feeling that the numerous AI models reflect a practical, current equilibrium of the many technology businesses here. There’s no master plan. The industry is defined by a respect for ByteDance and Alibaba, the incumbents expected to win large portions of all markets with their substantial resources. DeepSeek is the respected technical leader, but far from a market leader. They set the direction, but aren’t set up to win economically.This leaves companies like Meituan or Ant Group, where people in the West can be surprised they’re building these models. In reality, they see LLMs obviously as being central to future technology products, so they need a strong base. When they fine-tune the strong, general purpose model it hardens their stack from getting the open community to provide feedback on it, and they can keep internal, fine-tuned versions of the model for their products. The “open-first” mentality in the industry is largely defined by practicality — it helps make their models get strong feedback, it gives back to the open-source community, and empowers their mission.
    * Government aid is real, but unclear how big. It’s often asserted that the Chinese government is actively helping with the open LLM race. This is a government that’s decentralized across many levels, each of which doesn’t have a clear playbook for what exactly they do. Neighborhoods in Beijing compete for tech companies to house their offices there. The “help” offered to these companies almost certainly involved removing bureaucratic red tape like permits, but how far does it go? Can levels of the government help attract talent? Can they help smuggle chips? Across the visit, there were many mentions of government interest or help, but far too little to report the details as assertive or have a confident worldview of how government can bend the trajectory of AI in China. There were certainly no hints of the top levels of the Chinese government influencing any technical decisions in the models.
    * The data industry is far less developed. Having heard so much about the likes of Anthropic or OpenAI spending $10M+ for single environments, with cumulative spend on the order of hundreds of millions per year to push the frontier of RL, we were eager to know if Chinese labs are either buying the same environments from companies in the U.S. or supported by a mirrored domestic ecosystem. The answer was not quite complete that there’s no data industry, but rather that their experience was that the data industry was relatively poor quality and it is often better to build the environments or data in-house. Researchers themselves spend meaningful time making the RL training environments, and some of the bigger companies like ByteDance and Alibaba can have in-house data labelling teams to support this. This all mirrors the build-not-buy mentality from the previous bullet.
    * Desperation for more Nvidia chips. Nvidia compute is the gold-standard for training and everyone is limited in progress by not having more of it. If supply was there, it is obvious that they would buy it. Other accelerators, including but not limited to Huawei, were spoken positively of for inference. Countless labs have access to Huawei chips.
    These points paint a very different picture of an AI ecosystem, where quickly mapping how Western labs operate to their Chinese counterparts will often result in a category error. The crucial question is if these different ecosystems will produce meaningfully different types of models, or if the Chinese models will always be explained by being similar to the U.S. frontier models of 3-9 months ago.
    Conclusion: The global equilibrium
    I knew so little about China going into the trip and came out with the feeling of just starting to learn. China isn’t a place that can be expressed by rules or recipes, but one with very different dynamics and chemistry. The culture is so old, so deep, and still completely intertwined with how domestic technology is built. I have much more learning ahead.
    So much of the current power structures in the US use their current worldviews of China as crucial mental devices for decision making. Having talked, in person, either formally or informally to pretty much every leading AI lab in China, there are a lot of qualities and instincts in China that’ll be very hard to model with Western decision making. Even after asking directly about why these labs release their top models openly, the intersection between ownership mentality and genuine ecosystem support is hard for me to connect the dots on.
    The labs here are practical and not necessarily absolutists around open-source, where every model they build would be released openly, but there’s a deep intentionality in supporting developers, the ecosystem, and using it as a way to learn more about their models.
    Almost every major Chinese technology company is building their own general purpose LLMs, as we see with the likes of Meituan (delivery service) and Xiaomi (broad consumer technology company) releasing open weight models. The equivalent companies in the U.S. would just buy services. These companies aren’t building LLMs out of a race to be relevant with the hot new thing, but a deep fundamental yearning to control their own stack and develop the most important technologies of the day. When I look up from my laptop and always see bunches of cranes on the horizon, it obviously fits in the with the broader culture and energy around building in China.
    The humanity, charm, and genuine warmth of Chinese researchers is extremely humanizing. At a personal level, the cut-throat geopolitical conversation we’re used to in the U.S. hasn’t permeated them at all. The world can use more of this simple positivity. As a citizen of the AI community, I currently worry more about the fissures appearing within members and groups around labels of nationality.
    I’d be lying if I said I didn’t want US labs to be clear leaders in every part of the AI stack — especially with open models where I spend my time — I’m American, and that’s an honest preference. With this, I want the open ecosystem itself to thrive globally, as this can create safer, more accessible, and more useful AI for the world, and right now the question is whether American labs will take the steps to own that leadership position.
    As of finishing this piece, more rumors are swirling of executive orders influencing open models, which can further complicate this synergy between American leadership and the global ecosystem — it doesn’t fill me with confidence.
    Thank you to all the wonderful people I got to talk to at Moonshot, Zhipu, Meituan, Xiaomi, Qwen, Ant Ling, 01.ai, and others. Everyone has been so welcoming and gracious with their time. I’ll keep sharing my thoughts on China as they crystallize, across culture generally and AI specifically. It is obvious that this knowledge will be directly relevant to the story unfolding at the frontier of AI development.


    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
  • Interconnects

    The distillation panic

    04/05/2026 | 8 mins.
    ‘Distillation attacks’ is a horrible term for what is happening right now. Yes, some Chinese labs are hacking or jailbreaking APIs to attempt to extract more signal from model APIs — stopping this is important to maintain the U.S.’s lead in AI capabilities. Referring to this as distillation attack is going to irrevocably associate all distillation with this behavior, and distillation generally is a core technique needed to diffuse AI capabilities broadly through academic and economic activities.
    We went through this sort of language transition with the open source vs open weight debate. All the terms just reduced to open models – very few people in the large AI community know exactly how open-source differs from open-weights. And terminology matters, as the less informed people who still care about — and influence — the technology are bound by different terms they use. If we’re not careful with the discourse around distillation, many people could associate this broad technique used for research and development of new models as an act at the boundary of corporate manipulation and crime.
    I’ve recently written a more technical piece on estimating how impactful state-of-the-art distillation methods are on leading Chinese models, and this piece follows to push for caution in any hasty actions to target the methods with policy. To set the stage, recall Anthropic’s recent blog post where they detailed “distillation attacks” made by 3 Chinese labs.
    These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.
    This is a clever paragraph, where they normalize distillation generally and explain how a few people can use it illicitly, without detailing how illicit use often involves other more explicit behavior like jailbreaking, hacking, or identity spoofing of the API.
    Distillation itself is an industry standard. It’s used extensively, primarily in post-training, by smaller players to create specialized or smaller models. In my book coming this summer, I describe it as follows:
    The term distillation has been the most powerful form of discussion around the role of synthetic data in language models. Distillation as a term comes from a technical definition of teacher-student knowledge distillation from the deep learning literature.
    Distillation colloquially refers to using the outputs from a stronger model to train a smaller model.
    In post-training, this general notion of distillation takes two common forms:
    * As a data engine to use across wide swaths of the post-training process: Completions for instructions, preference data (or Constitutional AI), or verification for RL.
    * To transfer specific skills from a stronger model to a weaker model, which is often done for specific skills such as mathematical reasoning or coding.
    With this definition, it’s easy to see how distillation takes many forms. Of course, if you just take the outputs from GPT-5.5 and train a recent open-weight base model with them to host a competitive product, that’s one thing. But, a lot of the things that fall under the bucket of distillation are complex, multi-stage processes that muddle the exact impact of the model you distilled from.
    Modern LLM processes could look like using a GPT API to build an initial batch of synthetic data to build a specialized small data-processing model. A good example is a model like olmOCR (or many other models in this category) that are trained to convert PDFs to clean text. This specialized model would be used to create large amounts of data. Finally, you train another model (often from scratch) with the new data you created. Is this final model distilled from GPT?
    When done via a closed, API-based model, distillation sits in the grey area of the terms of service that you agree to when signing up to the Claude or GPT platform. They generally forbid the use of the API to create competing language model products, but this term has largely gone unenforced. The open-source community used to worry deeply at being cut off from these cutting-edge APIs for doing research or creating public datasets, but to date only one prominent case of corporate accounts being restricted exists (at least until the recent Chinese companies).
    This is all to say that distillation is an industry standard technique, and the use of closed APIs to perform distillation has always been a grey area. Nvidia’s latest Nemotron models, as one of the only models with open post-training datasets, are technically in large part distilled from Chinese, open-weight models. The Olmo models we’ve built at Ai2 are distilled from a mix of open and closed models. This grey area was brought to the forefront again when it turned out that xAI has been distilling from OpenAI. Quoting from the recent trial proceedings between Elon and OpenAI:
    OpenAI’s counsel asked Musk whether xAI has ever “distilled” technology from OpenAI.
    Musk: “Generally AI companies distill other AI companies.”
    “Is that a yes?” Savitt asked.
    Musk: “Partly.”
    xAI is likely the largest, and most successful AI company willing to thread the grey area that is distillation from their competitors. On the other side, the majority of startups and research groups with fewer resources than them have very likely engaged in distillation of some capacity from Claude, GPT, or Gemini models.
    Interconnects AI is a reader-supported publication. Consider becoming a subscriber.

    In the above Anthropic blog post, the problem with the distillation attacks by a few Chinese labs is less the distillation and more the means of attack. It is documented that Chinese labs are actively working to get around the intended use of the API, e.g. to provide additional reasoning data that is very useful for training.
    Of course no one should be able to access information from a model that a developer didn’t intend to reveal in their APIs (e.g., reasoning traces which would be helpful for training). Associating all of distillation with these attacks, which is to date an industry standard for post-training, from open and closed models alike will be a massive own goal.
    What these few labs are doing should be referred to as jailbreaking or abuse, rather than distillation.
    The discourse around these actions is creating a troubling discussion that’s marching towards a mix of regulatory capture or regulatory exuberance that’s most likely to harm the U.S.’s ecosystem more than China’s. Even if we ban, most likely through potential legal action and other penalties, this type of API abuse, the Chinese companies will likely still do it. We’ve seen this playbook with Chinese multimedia models taking a flexible view of copyrighted content that no U.S. player is willing to take the risk on.
    This distillation discussion has quickly snowballed, with a bill moving out of a committee in Congress, an executive order pushing for action, and congressional oversight targeting U.S. companies building on Chinese models (which are downstream of distillation). This multi-pronged regulatory environment could yield truly horrible outcomes – such as figuring out a way to effectively ban open-weight models in the U.S. that are built in China by groups abusing closed LLM APIs.
    It is obvious that no bill will literally ban open models, but they can create grey area that exposes entities to unwanted risk or require certain provisions that are bureaucratically very challenging to fulfill, squashing small open source contributors.
    In that scenario, the groups who lose are Western academics and smaller companies building models for the long-tail of AI uses. The ecosystem here could be made permanently irrelevant with the removal of nearly all Chinese open-weight models. There is no immediate substitute and building new models with meaningful community adoption has a lead time measured in 6+ months. In the time it takes to build a new domestic open-source ecosystem, countless researchers would’ve moved onto closed training platforms or into new areas.
    Altogether, I’m hoping this flurry of discussion around distillation becomes a nothing-burger and not a hasty, multi-pronged policy push. We need to avoid two things:
    * A wholesale negative connotation of the word distillation, which is used extensively across the AI ecosystem.
    * A domestic ban of the open-weight models built by organizations engaged in some portion of distillation.
    In addition to this, I want the leading U.S. AI companies to be able to provide their APIs without having their IP leak. They should share more information on why it is hard for them to secure their APIs, but that’s an issue out of scope for my expertise.
    I’ll conclude with a proposal from my friend Kevin Xu at Interconnected Capital (and great Substack) on why this current distillation dynamic may actually be good for the leading labs.
    If all the Chinese companies are addicted to distillation as a way of getting close to the frontier, then they’ll never actually learn the techniques needed to take an outright lead. If we cut off the Chinese’s obvious crutch in model building, we’ll gain a short-term lead in AI, but in the long-term that may be what they needed to get on a more competitive long-term trajectory.
    This is the same debate we’re having with other technologies where the U.S. currently has a lead, e.g. with advanced semiconductor technologies. So I understand the trade-offs, but we not should crack down on all of distillation.


    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
  • Interconnects

    My bets on open models, mid-2026

    15/04/2026 | 6 mins.
    We’re living through the period of time when we’ll learn if open models can keep up with closed labs. The obvious answer is that no, they won’t. This answer is a form of saying they won’t keep up in every area. This framing closes off a popular prediction where the open models completely catch up, as in all models saturate and open and closed models only become increasingly similar. In living through this, it’s evidently very unclear when the longer-term stable balance of capabilities will solidify.
    This is a very complex dynamic, where the core point we monitor is a capability gap between models. At the same time, this gap is intertwined with evolving dynamics in the funding of open models, who builds open models, how techniques like distillation that enable fast-following translate through new application domains, potential regulation hampering the open-source AI ecosystem, and of course who actually uses open models.
    The capabilities gap is one signal in a complex sea of forces, pushing supply and demand into different shapes. In many cases the demand — where obviously tons of individuals, organizations, and sovereigns want, or need, open models — is largely separated from supply. Supply is fully dictated by economics. The question of “which business strategies support releasing open models” is still at stake.
    Interconnects AI is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.

    With this complexity, I wanted to distill my key beliefs down into a clear list. These are downstream of 10+ pieces I’ve written or recorded on open models this spring (which are linked throughout).
    * It’s surprising that the top closed models did not show a growing capability margin over open models, based on compute differences for training and research, especially in the second half of 2025 and through today.
    * Open model labs are technically very strong at keeping pace on well-established benchmarks. This will continue and reflects a balance of abundant talent and sufficient computing power.
    * Chinese open-weight labs focus slightly more on benchmark scores than comparable closed labs in the U.S. Distillation helps the Chinese LLM companies do so, but it’s not a panacea. Changes in the distillation dynamic (e.g. regulation) will not be a determining factor on the balance of capabilities. This increase in focus is a natural evolution of their incentives in keeping the narrative on keeping up with the frontier alive, which is crucial to fundraising and adoption.
    * To date, closed models tend to be more robust and generally useful than similarly scoring open models. Closed models have certain hard-to-measure qualities that are not well captured in current or past benchmarks. This will be key to enabling closed models to dominate in markets where an individual user constantly presents new challenges, i.e. supporting knowledge workers as a direct assistant.
    * The open vs. closed model race, as monitored through benchmarks, will largely be a game of economic staying power and fast-following, until the market structure constricts. I expect Chinese open-weight labs to face funding difficulties first, as soon as later this year. Funding difficulties will be seen in different capability trajectories 3-9 months later.
    * The RL dominated training era has increased the relevance of distribution to real-world use-cases as a key factor in continued capabilities improvements. These are tasks where users directly use tools like Claude Code or Codex to solve problems in their job with agents. This is the first clear technical area that closed labs can dominate open-weight models on capabilities, potentially leveraging online RL directly based on user feedback.
    * Open models will be increasingly adopted in repetitive automation tasks, as measured in the relative share of the API market, for repetitive tasks across the ecosystem. This takes the form of many new AI-native applications, business backend automation, etc. The success of this will drive more investment in domain-specific, efficient open models.
    This is a complex picture, where the long-term trajectory is more of an economics question rather than an ability one. Many other outlets can paint a far more simplistic narrative that “China will assuredly catch us in AI” and get more distribution because it is a simple story. The reality is complex. Only real AI revenue begets more investment, eventually that’ll be linked to the ability to keep improving models at a rapid rate. Economic realities have not yet impacted scaling open models, as a general category.
    This economic-focused angle relates to my positions on the open model ecosystem more broadly.
    * Recurring calls to ban certain types of open models will continue to come but are in practice impossible to implement. Training strong AI models (i.e. near but not at the frontier) is a relatively small cost compared to large-scale deployments. E.g. if the U.S. bans open models over a certain compute threshold, another sovereign entity will eventually train them and release them publicly, with the models entering the U.S. market with less oversight.
    * The second derivative of influence on open models has shifted, and the U.S. will slowly regain ground in adoption metrics of open models starting in early 2027 (it takes a long time for China’s velocity to slow, then flip). Examples include Google’s Gemma 4 (a wild success), Nvidia’s Nemotron, and Arcee AI.
    * As ever-stronger closed models are built, previewed, and released, there will be more safety-shocks saying that open-weight versions of the strongest AI models never can be allowed to exist, similar to reactions to Claude Mythos. These can spur burdensome regulation on open models.
    * With the above, there will also be increased long-term interest in open models, as sovereign entities and existing power structures realize the coming, super powerful AI tools cannot land in the hands of only one or a few companies. These entities will see open models as a different governance paradigm.
    * New funding structures for open models will emerge, as many stakeholders realize dependencies on single, for-profit companies for access to intelligence are unreliable.
    * Local agents, OpenClaw, and other personal agents represent a large, to date, mostly ignored market for open model usage. It is a sort of dark matter, with pervasive, massive potential for influence on the balance of open-to-closed models.
    A single word governs this post and is intentionally repeated — complex.
    This complex reality has been driving me to think more deeply about how to clearly describe the open model gap, and why I can hold it in my head that I expect American closed labs to clearly draw ahead, despite the fairly unequivocal evidence in support of the capabilities of recent open-weight models. More on the nuance in the open-closed gap in another piece coming soon, so please subscribe!
    Let me know any positions that I missed.


    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
  • Interconnects

    The inevitable need for an open model consortium

    11/04/2026 | 5 mins.
    Recently, I was talking with Percy Liang, Stanford professor and lead of the Marin project (another fully-open model lab), and it set in on me that there will eventually be a consortium of companies funding a foundational set of open models used across industry. It’s not clear when this’ll emerge, and Nemotron (Coalition) is Nvidia’s attempt to bankroll and bootstrap this approach within a single wealthy company, but a consortium is the only long-term stable path to well-funded, near-frontier open models.
    In recent months, we’ve seen a lot of turnover in open model labs, with high-profile departures at Qwen and Ai2 (my comment). This shouldn’t be super surprising to followers of the ecosystem — it’s happened before with Meta shifting its focus away from Llama, and it’ll only happen more as the cost of trying to keep pace at the frontier of AI only increases. The other leading labs with models available today include Chinese startups such as Moonshot AI, MiniMax, and Z.ai — all of which look precarious on their ability to fund continued growth in the cost of training or R&D. Releasing one’s strongest models openly today is in active tension with the option of spending focus and resources on AI products that can currently generate meaningful revenue (and profits).
    We’re going to see business models emerge around releasing some, or even many, models openly, but these will largely be smaller models that enable a long-tail of functionality, rather than models at the absolute frontier. This class of companies that’ll release many, strong fine-tunable models will include the likes of Arcee AI, Thinking Machines, OpenAI, Google with Gemma, and more in that class. The cost and relative advantage of keeping the best models closed in a business environment with many opportunities for revenue are too high. To summarize — there will be an ever increasing number of companies releasing models that are good for creating a lively niche of smaller, custom models, but an ever decreasing number of companies willing to release fully open, near-frontier models.
    This is the core thesis of why I’m pushing hard for more people to do more research on how these smaller models can complement the best closed agents, the science of finetunability, etc. See my post below — it’s about creating a sustainable open model ecosystem, whether or not the frontier of open keeps paced with closed:
    It’ll take years for this equilibrium to become more obvious, seen through the lens of more open model families coming and going. This year, it seems likely we’ll see Nvidia’s Nemotron reach new heights, Reflection AI challenge some of the Chinese models with a strong, large MoE, maybe Meta releases a new open-weight model, and so on. True pressure to change strategy will only come when the capital environment punishes the less efficient spend on resources (e.g. giving away your competitive advantage, in having an in-house model). This pressure will likely hit Chinese startups training these models first.
    All of Moonshot AI, MiniMax, and Zhipu AI will show signs of financial challenge in the coming years if they retain their strategy, on top of their models falling further behind the best open models in terms of generality. This is inevitable pressure to evolve open models to areas that are profitable and complementary of the frontier of AI.
    Nvidia, which is best positioned to support the open ecosystem in the near term to support its core GPU business, could face many pressures to pull back its open model efforts. It could:
    * Realize it’s too competitive to their biggest customers as they succeed too much with Nemotron,
    * Fall to competition on their core business and lose the free cash flow buffer needed to fund this (e.g. it’s 2031 and OpenAI, Anthropic, Google, and the other frontier labs are worth so much they build their own chips).
    * Start succeeding beyond their initial goals and keep the chips for them to build ASI themselves, as a closed-weight model.
    The pressures for new funding mechanisms for open models are based on the assumptions of continued, substantive progress on the capabilities of frontier models. Mechanisms such as self-improvement and scaling all stages of the training pipeline are underway. This progress of capabilities will only increase the potential profit in selling models as and in products, not giving them away. The scale of investment required has already begun to push away non-profits from the game of making truly frontier-scale models. Capitalism is designed to make companies ruthless and chase down leads on profitability, not donate technology as charity.
    As the economic environment shifts companies away from releasing the strongest models openly, more companies that rely on these models will look for an outlet of securing model access into the future. This is going to be compounded by a growing group of companies who come to rely on open-weight models for their workflows.
    These points loop back into how model training is getting more expensive, so where desire to have the models will go up, ability to procure them will go down for many players. There are x-factors that could multiply the demand for institutions to ensure the existence of open models, such as the best frontier models not even being available via API (such as if Claude Mythos never goes general access).
    As training relevant models is shifting to cost billions of dollars, rather than millions, few companies well be able to afford it. many companies will bite at the cost of paying 1/10th of the cost to train a frontier model, or if the consortium works, 1/50th. The upside for companies will be some mechanism to steer development (e.g. model sizes) or getting early access to develop internal and open-source tooling for the model.
    It is in my nature to, by default, say this idea will fail, as training models is inherently a complex and high-focus endeavor, one that requires integration of every part of the stack and focusing specifically on your own vision and needs, rather than trying to serve every possible user. Eventually the need for open intelligence — and economic pressure to build it — will make a model consortium inevitable.


    This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
More Science podcasts
About Interconnects
Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai
Podcast website

Listen to Interconnects, The Rest Is Science and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Interconnects: Podcasts in Family