All about technology. — All about artificial intelligence.

Empowering AI to perceive websites analogously to humans enhances its capabilities significantly.

Artificial Intelligence Developed by Tencent Now Capable of Performing Most Tasks on Google, Amazon, and Wikipedia

, and Administrator

2025 July 7 . 11:24 PM

2 min read

Enhancing AI's Web Browsing Capabilities to Mimic Human Viewing Improved its Performance

Empowering AI to perceive websites analogously to humans enhances its capabilities significantly.

In a groundbreaking development, scientists at Tencent AI Lab have unveiled WebVoyager, an AI agent designed for autonomous web browsing. This innovative system leverages both textual and visual inputs to interact with web browsers and complete complex user tasks on popular sites like Google, Amazon, and Wikipedia.

WebVoyager's observation space is meticulously organised, with interactive elements enclosed by borders and tagged with numbers. This structured approach allows the agent to navigate and interact with web pages more effectively.

The agent processes primary visual input in the form of a screenshot of the current webpage, using computer vision techniques to interpret visual elements such as images, buttons, and other graphical user interface (GUI) components. It also employs object detection to identify specific objects or elements on a webpage, such as input fields or buttons, which is crucial for completing tasks.

In terms of textual input, WebVoyager uses Natural Language Processing (NLP) to analyse and understand textual inputs, such as user queries or the content of web pages. This enables the agent to identify relevant information and make decisions based on it. It can also identify specific tasks or actions required, such as filling out forms or clicking buttons, based on the text analysis.

By combining textual and visual inputs, WebVoyager can better understand the context and requirements of a task. For example, it might use visual cues to identify where to click and textual analysis to determine what text to enter. Once tasks are identified and understood, the agent can automate the process of completing them, such as filling out forms, submitting data, or clicking on specific elements.

WebVoyager's extensive experiments yielded promising results, with the system successfully completing over 55% of complex web tasks. Navigation issues were the most common failure cause in these experiments. The agent uses Selenium for interacting with real dynamic websites, enabling it to perform actions like clicking links and buttons, entering text, scrolling, going back, and jumping to search engines.

The automatic evaluator using the GPT-4V model reached 85.3% agreement with human judgements of task success. This indicates a high level of accuracy in the tasks completed by WebVoyager.

The development of such AI agents could potentially allow them to independently look up answers online, rather than just responding based on limited knowledge. Equipping AI systems with more human-like web browsing abilities could enable the next generation of capable and useful virtual assistants. This study represents an important step toward developing AI agents that can browse the web autonomously, similarly to how humans do.

While this article focuses on the WebVoyager system, it is worth noting that prior work on web-capable AI agents has been limited, with many existing approaches only handling simplified simulated websites or small subsets of HTML. The creation of WebVoyager marks a significant leap forward in the field of AI web browsing.

Further research into AI navigation and web browsing is expected to yield even more exciting advancements in the near future. As AI agents become more adept at navigating the complex and ever-changing landscape of the internet, they will undoubtedly prove to be invaluable tools for a wide range of applications.

Science has demonstrated that AI agents, such as WebVoyager, can revolutionize medical-conditions diagnosis by independently looking up accurate information online, thus expanding their knowledge beyond their pre-programmed responses. Moreover, with the advancement of technology and artificial-intelligence, these AI agents could soon assist in managing and predicting medical-conditions more efficiently.

Latest

Partnering trio of Monoova, SuperAPI, and Payroo unveil seamless, one-click payroll and super...

All about technology.

Partnership Announcement: Monoova, SuperAPI, and Payroo Join Forces to Unveil Seamless Single-Click Payroll and Super Service

Three entities, Monoova, SuperAPI, and Payroo, jointly introduce a streamlined payroll and superannuation service with a single click, in preparation for the impending PayDay Super laws in Australia.

, and Administrator

2025 July 8

Guide to the Future of Audio-Visual and Information Technology in 2025 for AV Managers

All about technology.

Guide to the Future of Audio-Visual and Information Technology in 2025 for AV Technology Managers

Explore the AV Technology Manager's Insight into The Future of AV/IT in 2025 to kick-start your preparations for the upcoming year.

, and Administrator

2025 July 8

Enhanced Flexibility in Game-Day Task Management by Daktronics

All about technology.

Improved Flexibility in Day-of-Game Work Processes by Daktronics

Explore the debut of the new Show Control Phase One

, and Administrator

2025 July 8

Equities Face Pressure Prior to Tariff Decision on Wednesday

All about technology.

Markets Face Intense Pressure Prior to Tariff Decision Due on Wednesday

Stock market indices witness a drop today: S&P 500 Index (-0.56%), Dow Jones Industrials Index (-0.54%), and Nasdaq 100 Index (-0.70%). September E-mini S&P futures also slump by -0.54%, while September E-mini Nasdaq futures face a similar trend.

, and Administrator

2025 July 8

Empowering AI to perceive websites analogously to humans enhances its capabilities significantly.

Empowering AI to perceive websites analogously to humans enhances its capabilities significantly.

Read also:

Related

Latest