Amazon's Token-Wasting Problem Is a Measurement Crisis, Not a Compliance Crisis

May 15, 2026

Amazon employees are apparently "tokenmaxxing" - deliberately wasting AI tokens to hit arbitrary usage targets set by management. Reporting this week confirms that workers are gaming AI utilization metrics rather than using AI tools productively, because the organizational incentive is to appear compliant rather than to be effective. This is not a story about lazy workers or bad actors. It is a story about what happens when organizations treat AI adoption as a procedural compliance problem rather than a competence development problem.

The Metric Is Not the Thing It Measures

Amazon's management appears to have reasoned as follows: AI tools create productivity gains, productivity gains require AI usage, therefore measuring AI usage will produce productivity gains. This reasoning collapses the distinction between an indicator and the underlying construct it is meant to track. When organizations set token usage targets, they are measuring a behavioral proxy for competence. Workers who understand this - and workers generally do understand this quickly - respond rationally by optimizing the proxy rather than the underlying competence. The result is a system where measured AI adoption increases while actual capability development stagnates or declines.

This pattern has a precise theoretical name. Kellogg, Valentine, and Christin (2020) document how algorithmic management systems routinely produce behavioral compliance without capability transfer. Workers learn to satisfy the measurement architecture, not the productive logic behind it. The Amazon case is a corporate-side version of the same dynamic: management designs a measurement architecture, and workers respond to that architecture rather than to the intended goal. The organization ends up with high token counts and low adaptive competence.

Why Procedural Mandates Fail in Novel Environments

The deeper problem is that Amazon is treating AI tool adoption as a procedure to be mandated rather than a competence to be developed. Hatano and Inagaki (1986) draw a foundational distinction between routine expertise - the ability to execute a known procedure in familiar contexts - and adaptive expertise - the ability to modify approaches when contexts shift. Procedural mandates, by definition, target routine expertise. They specify what to do without building the structural understanding needed to know when to do it differently.

AI tools are not static. The environments in which they operate shift continuously. A worker who hits token targets by pasting arbitrary text into a prompt window has learned a procedure. A worker who develops genuine understanding of how AI tools handle different task structures, and what kinds of queries produce useful versus useless outputs, has developed adaptive expertise. The first worker satisfies the metric. The second worker can actually transfer capability when the tool changes, when the task changes, or when the organization moves to a different platform entirely. Amazon's current measurement regime selects for the first type and provides no developmental pathway toward the second.

The Awareness-Capability Gap at the Organizational Level

Research on algorithmic literacy consistently identifies what I refer to in my own work as the awareness-capability gap: knowing that an algorithm exists, or knowing that a tool is available, does not translate into knowing how to use it effectively (Gagrain, Naab, and Grub, 2024). Amazon's tokenmaxxing problem is a corporate-level instantiation of this gap. Management is aware that AI tools exist and that they should produce value. Employees are aware that usage targets exist and that they should be hit. Neither awareness translates into the structural understanding needed to actually develop productive human-AI workflows.

Sundar (2020) argues that as machine agency increases, human users must develop increasingly sophisticated mental models of system behavior to interact productively with AI systems. Token-wasting behavior is evidence that no such mental models are being developed. Workers are not building schemas for how these tools work. They are building folk theories about how to satisfy the measurement system, which is a categorically different cognitive product.

What a Competence-First Design Would Look Like

The corrective is not to remove metrics but to redesign what gets measured. Output quality, task completion improvement, and cross-context transfer are harder to measure than token counts, but they track the underlying construct that Amazon actually cares about. More importantly, an organization serious about AI capability development would invest in schema induction rather than procedural mandates: helping workers develop structural understanding of what these tools do and do not do well, rather than specifying how many interactions to log per week.

The tokenmaxxing story will likely be framed as a worker compliance problem in most coverage. That framing is incorrect. It is an organizational design problem, and the design flaw is the assumption that measuring behavioral proxies for competence produces competence. It does not. It produces workers who are very good at satisfying behavioral proxies, which is a distinct and considerably less useful skill.

References

Gagrain, A., Naab, T., and Grub, J. (2024). Algorithmic media use and algorithm literacy. New Media and Society.

Hatano, G., and Inagaki, K. (1986). Two courses of expertise. In H. Stevenson, H. Azuma, and K. Hakuta (Eds.), Child development and education in Japan. Freeman.

Kellogg, K. C., Valentine, M. A., and Christin, A. (2020). Algorithms at work: The new contested terrain of control. Academy of Management Annals, 14(1), 366-410.

Sundar, S. S. (2020). Rise of machine agency: A framework for studying the psychology of human-AI interaction. Journal of Computer-Mediated Communication, 25(1), 74-88.