On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside significantly larger models; it outpaces DeepSeek-V3.2, which scores 70.2%, ...
While you're in meetings or grabbing coffee, it analyzes problems, writes solutions, and delivers working code ready for review.
This article was created by StackCommerce. Postmedia may earn an affiliate commission from purchases made through our links on this page.
一些您可能无法访问的结果已被隐去。
显示无法访问的结果