Skip to content

Getting structured data

This tutorial explains how to create a tool that returns structured data. This is particularly useful when the page contains a lot of data, which is hard for an agent to process if it’s provided in a raw format.

Example

Let’s create a tool that would retrieve Google emails from the Gmail page in a structured way, having the following structure:

[
{
"date": "pon., 20 sty 2025, 12:05",
"from": "Coworking",
"subject": "Tests of electrical installation",
"checkbox": "<input type=\"checkbox\" id=\"...\" />"
},
...
]

For this, the new task should be created with parameters:

  • Main

    • Name: “Google Mail: Get emails”
    • Description: “Get Google Mail emails list.”
    • Tool: get_page_data
    • Inherit parameters: tab, url (we want to be able to provide tab or url to extract the data from)
  • Arguments:

    • Schema:
    {
    "type": "array",
    "selector": "tbody > tr",
    "schema": {
    "type": "object",
    "schema": {
    "from": "td:nth-child(5)",
    "subject": "td:nth-child(6)",
    "date": "td:nth-child(9) > span:attr('title')",
    "checkbox": "td:nth-child(2) [role=\"checkbox\"]:html()"
    }
    }
    }
    • Make sure to test the task and see the emails in the output using the Test button. The result should look like:
    [
    {
    "date": "pon., 20 sty 2025, 12:05",
    "from": "Coworking",
    "subject": "Tests of electrical installation",
    "checkbox": "<input type=\"checkbox\" id=\"...\" />"
    },
    ...
    ]
  • Options:

Schema explanation:

  • "type": "array" - the root element is a list of items
  • "selector": "tbody > tr" - emails are located in the “tr” element in the table body on the page
  • "schema": ... - schema for each email
  • "type": "object" - each email is an object
  • "schema": ... - email schema
  • "from": "td:nth-child(5)" - sender information is contained in in the 5th column
  • "subject": "td:nth-child(6)" - subject is contained in in the 6th column
  • "date": "td:nth-child(9) > span:attr('title')" - date is in 9th column, with the full date contained in attribute “title” of the tag “span” that is direct child of “td”
  • "checkbox": "td:nth-child(2) [role=\"checkbox\"]:html()" - the html code of the interactable element that can be used to select the email by (interact_with_page)[/tools/interact_with_page] tool

After tool is created, test that it can be accessed by the agent using the prompt:

Call the tool `get_google_mail_emails` and show the first 5 emails that are returned by it.
▸ Calling tool: execute_agent_query
▸ Calling tool: get_google_mail_emails
▸ Calling tool: get_page_data

Example output:

Here are some of the recent emails:
1. From: Coworking
Subject: Tests of electrical installation
Date: pon., 20 sty 2025, 12:05
2. ...