Getting structured data
This tutorial explains how to create a tool that returns structured data. This is particularly useful when the page contains a lot of data, which is hard for an agent to process if it’s provided in a raw format.
Example
Let’s create a tool that would retrieve Google emails from the Gmail page in a structured way, having the following structure:
[ { "date": "pon., 20 sty 2025, 12:05", "from": "Coworking", "subject": "Tests of electrical installation", "checkbox": "<input type=\"checkbox\" id=\"...\" />" }, ...]For this, the new task should be created with parameters:
-
Main
- Name: “Google Mail: Get emails”
- Description: “Get Google Mail emails list.”
- Tool: get_page_data
- Inherit parameters: tab, url (we want to be able to provide tab or url to extract the data from)
-
Arguments:
- Schema:
{"type": "array","selector": "tbody > tr","schema": {"type": "object","schema": {"from": "td:nth-child(5)","subject": "td:nth-child(6)","date": "td:nth-child(9) > span:attr('title')","checkbox": "td:nth-child(2) [role=\"checkbox\"]:html()"}}}- Make sure to test the task and see the emails in the output using the
Testbutton. The result should look like:
[{"date": "pon., 20 sty 2025, 12:05","from": "Coworking","subject": "Tests of electrical installation","checkbox": "<input type=\"checkbox\" id=\"...\" />"},...] -
Options:
- Url: https://mail.google.com
- Tool name:
get_google_mail_emails- expose the task as tool for AIs and humans
Schema explanation:
"type": "array"- the root element is a list of items"selector": "tbody > tr"- emails are located in the “tr” element in the table body on the page"schema": ...- schema for each email"type": "object"- each email is an object"schema": ...- email schema"from": "td:nth-child(5)"- sender information is contained in in the 5th column"subject": "td:nth-child(6)"- subject is contained in in the 6th column"date": "td:nth-child(9) > span:attr('title')"- date is in 9th column, with the full date contained in attribute “title” of the tag “span” that is direct child of “td”"checkbox": "td:nth-child(2) [role=\"checkbox\"]:html()"- the html code of the interactable element that can be used to select the email by (interact_with_page)[/tools/interact_with_page] tool
After tool is created, test that it can be accessed by the agent using the prompt:
Call the tool `get_google_mail_emails` and show the first 5 emails that are returned by it.
▸ Calling tool: execute_agent_query▸ Calling tool: get_google_mail_emails▸ Calling tool: get_page_dataExample output:
Here are some of the recent emails:
1. From: Coworking Subject: Tests of electrical installation Date: pon., 20 sty 2025, 12:05
2. ...