The Benchmark Convergence Nobody Is Talking About
This week, enterprise AI adoption crossed a threshold that deserves more careful attention than it is receiving. Open source AI models now match or outperform proprietary systems on most major benchmarks, while costing enterprises up to 87% less to deploy. Alibaba's Qwen family has surpassed 700 million downloads. Meta's Llama 4 is outperforming GPT-4o on coding and reasoning tasks. The price gap between open and closed AI infrastructure has collapsed so dramatically that the commercial argument for proprietary models is no longer self-evident. This is not a gradual shift. It is a structural reorganization of the enterprise AI landscape, and it raises a question that benchmark comparisons cannot answer: when the cost of access to capable AI drops toward zero, what actually explains differences in organizational outcomes?
Access Parity Does Not Produce Outcome Parity
The logic driving enterprise excitement about open source AI is straightforward: if Llama 4 performs comparably to GPT-4o at a fraction of the cost, then organizations that deploy it gain the same capability advantage at lower overhead. This reasoning is structurally correct but empirically incomplete. It assumes that capability is located in the model itself, and that equal access to the model produces equal outcomes across organizations. Neither assumption survives contact with the evidence we have on how algorithmically-mediated environments actually work.
Kellogg, Valentine, and Christin (2020) documented systematic variance in worker outcomes across platform environments where access was held constant. The variance could not be attributed to natural ability differences alone. What their analysis pointed toward was that competence in algorithmically-mediated contexts develops endogenously through participation, not through access. Organizations deploying open source models at scale are not simply acquiring a tool. They are entering a coordination environment that requires a specific kind of structural literacy to navigate effectively. The 87% cost reduction changes the economics of entry, not the difficulty of competent use.
The Folk Theory Trap in Enterprise AI Deployment
Here is where the organizational risk concentrates. When proprietary AI was expensive and access-constrained, organizations typically built deliberate implementation processes. The cost of entry forced a certain procedural seriousness. As open source models become freely available, the friction that enforced deliberate adoption disappears. Organizations will deploy rapidly, generate uneven results, and construct post-hoc explanations for why their deployment did or did not work. Gagrain, Naab, and Grub (2024) identified this pattern in individual media users: awareness of algorithmic systems is widespread, but accurate structural understanding is rare. What people develop instead are folk theories - plausible-sounding but structurally inaccurate mental models of how the system actually operates.
The enterprise equivalent is an organizational folk theory: a shared but informal account of what the AI system is doing and why certain outputs occur. These theories are not randomly distributed. They tend to reflect the concerns and vocabularies of whoever controls the deployment narrative inside the organization, which is often the team that championed adoption rather than the team that understands the system's actual constraints. Hatano and Inagaki (1986) drew a clean distinction between routine expertise, which is procedural and context-bound, and adaptive expertise, which is principled and transfers across novel situations. An organization that has deployed one open source model successfully and built procedures around that deployment has developed routine expertise. That expertise will not transfer when the model is updated, swapped, or extended into a new use case.
What the Cost Collapse Actually Changes
The open source surge does not reduce the competence problem. It scales it. When access was expensive, the population of organizations attempting enterprise AI deployment was self-selected toward those with resources to absorb implementation risk. As costs drop, the deployment population expands to include organizations with far less capacity for structured implementation. The result is a larger variance in outcomes, not a compression of it. Power-law distributions, the kind Schor et al. (2020) documented in platform labor markets, do not flatten as access democratizes. They often steepen, because algorithmic amplification of early differences operates independently of the access cost that preceded them.
The 700 million downloads of the Qwen family represent 700 million entry points into this dynamic. Some fraction of those deployments will produce genuinely differentiated organizational outcomes. Most will not. The difference will not be explained by which model was chosen or what it cost. It will be explained by whether the organization developed accurate structural understanding of what the system can and cannot do, and whether that understanding was institutionalized in a form that survives personnel turnover and model updates. That is a harder problem than the cost comparison makes it appear, and the current coverage of the open source AI story is not engaging with it seriously.
References
Gagrain, A., Naab, T. K., & Grub, J. (2024). Algorithmic media use and algorithm literacy. New Media & Society.
Hatano, G., & Inagaki, K. (1986). Two courses of expertise. In H. Stevenson, H. Azuma, & K. Hakuta (Eds.), Child development and education in Japan (pp. 262-272). Freeman.
Kellogg, K. C., Valentine, M. A., & Christin, A. (2020). Algorithms at work: The new contested terrain of control. Academy of Management Annals, 14(1), 366-410.
Schor, J. B., Attwood-Charles, W., Cansoy, M., Ladegaard, I., & Wengronowitz, R. (2020). Dependence and precarity in the platform economy. Theory and Society, 49(5-6), 833-861.
Roger Hunt