Gate.io data on February 17th, Microsoft released the latest version V2.0 of the visual Agent parsing framework OmniParser on its official website, which can transform models such as DeepSeek-R1, GPT-4o, Qwen-2.5VL into AI Agents that can be used on computers. Compared to V1, V2 has higher accuracy and faster inference speed when detecting smaller interactive UI elements, reducing the delay time by 60%. In the high-resolution Agent Benchmark test ScreenSpot Pro, the accuracy of V2+GPT-4o reached an astonishing 39.6%, while the original accuracy of GPT-4o was only 0.8%, showing a significant improvement overall. In addition to V2, Microsoft also open-sourced omnitool, which is a Docker-based Windows system covering functions such as screen understanding, positioning, action planning, and execution, and is a key tool for transforming large models into Agents.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
2 Likes
Reward
2
4
Share
Comment
0/400
GateUser-d6ca73f1
· 02-23 07:30
Spot bölgesine gidebilir miyim
View OriginalReply0
GateUser-50c1e0dd
· 02-17 03:19
Boğa Koşusu 🐂
View OriginalReply0
GateUser-50c1e0dd
· 02-17 02:39
boğa koşusu 🐂
View OriginalReply0
Mmhreyan8513
· 02-17 00:26
Ape In 🚀Boğa Koşusu 🐂HODL Sıkı 💪1000x Vibes 🤑1000x Vibes 🤑HODL Sıkı 💪Boğa Koşusu 🐂Ape In 🚀
Microsoft Açık Kaynak inovasyon çerçevesi: DeepSeek'i AI Ajana dönüştürebilir
Gate.io data on February 17th, Microsoft released the latest version V2.0 of the visual Agent parsing framework OmniParser on its official website, which can transform models such as DeepSeek-R1, GPT-4o, Qwen-2.5VL into AI Agents that can be used on computers. Compared to V1, V2 has higher accuracy and faster inference speed when detecting smaller interactive UI elements, reducing the delay time by 60%. In the high-resolution Agent Benchmark test ScreenSpot Pro, the accuracy of V2+GPT-4o reached an astonishing 39.6%, while the original accuracy of GPT-4o was only 0.8%, showing a significant improvement overall. In addition to V2, Microsoft also open-sourced omnitool, which is a Docker-based Windows system covering functions such as screen understanding, positioning, action planning, and execution, and is a key tool for transforming large models into Agents.