Right now ResultMessage.usage lumps thinking tokens into output_tokens, but thinking tokens are actually billed differently from regular output. This makes it impossible to accurately track costs.