Agent Communication Meta-Protocol Specification

ANP Agent Communication Meta-Protocol Specification (Draft)

Note:

This section is in draft stage and may undergo significant adjustments based on actual implementation.
The current protocol implementation is based on end-to-end message encryption, which will later be modified to a solution based on did:wba and HTTP.

Background

Meta-Protocol refers to a protocol for negotiating communication protocols—specifically, a protocol that defines how other protocols operate, parse, combine, and interact. It provides protocol rules and patterns, helping to design universal, highly extensible communication mechanisms. Meta-protocols typically do not handle specific data transmission but instead define the communication framework and basic constraints for protocol operation.

Meta-protocols can greatly improve communication efficiency between agents and reduce communication costs. If agents use natural language to transmit data, internal processing within agents requires LLMs (Large Language Models), which results in low information processing efficiency and high costs. Using meta-protocols combined with AI-generated protocol processing code can:

Improve data transmission efficiency: By negotiating protocols before data enters LLMs, the amount of data processed by LLMs can be reduced, improving data transmission efficiency.
Enhance data understanding accuracy: By structuring data at the source rather than having LLMs process unstructured data, data understanding accuracy can be improved.
Reduce data processing complexity: Certain domains have highly complex data and numerous established protocol specifications that cannot be effectively communicated through natural language, such as audio and video data.

At the same time, meta-protocols enhanced by artificial intelligence can transform agent networks into self-organizing, self-negotiating collaborative networks. Self-organization and self-negotiation mean that agents within the network can autonomously connect, negotiate protocols, and reach protocol consensus. Through natural language and meta-protocols, agents can communicate their respective capabilities, data formats, and utilized protocols, ultimately selecting the optimal communication protocol to ensure efficient collaboration and information transmission across the network.

In the meta-protocol layer, we have referenced and drawn inspiration from the Agora Protocol approach, combining it with best practices and challenges in specific protocol scenarios to design the AgentNetworkProtocol meta-protocol specification.

How Protocols Are Currently Negotiated

In current software systems, when APIs are provided for external development, they typically include API usage methods, including parameters, return values, and protocols used. This process is essentially a protocol negotiation process, with the following drawbacks:

It requires manual protocol design and protocol handling code development. Without appropriate protocols, communication is not possible.
Protocol integration requires significant human involvement, with multiple rounds of communication and confirmation.
Without industry standards, multiple systems using different definitions require separate negotiations and integrations.

Meta-Protocol Negotiation Process

LLM-enhanced agents combined with meta-protocols can partially address the protocol negotiation problems in existing software systems. The main process is as follows:

  Agent (A)                                       Agent (B)
    |                                                 |
    | ------------- Protocol Negotiation -----------> |
    |                                                 |
    |         (Multiple negotiations may occur)       |
    |                                                 |
    | <------------- Protocol Negotiation ----------- |
    |                                                 |
    |---------------                                  |
    |              |                                  |
    |   Protocol Code Generated                       |
    |              |                                  |
    | <-------------                                  |
    | --------------- Code Generation --------------> |
    |                                                 |---------------  
    |                                                 |              |
    |                                                 |   Protocol Code Generated
    |                                                 |              |
    |                                                 | <-------------  
    | <-------------- Code Generation --------------- |
    |                                                 |
    |                                                 |
    | ------------ Test Cases Negotiation ----------> |
    |                  (Optional)                     |
    |         (Multiple negotiations may occur)       |
    |                                                 |
    | <----------- Test Cases Negotiation ----------- |
    |                                                 |
    |                                                 |
    |    (Start Communication Using Final Protocol)   |
    |                                                 |
    | <------- Application Protocol Message --------> |
    |                                                 |
    |                                                 |

As shown in the diagram, the protocol negotiation process initiated by Agent A to Agent B is as follows:

Agent A first uses natural language to initiate protocol negotiation with Agent B, carrying A’s requirements, capabilities, expected protocol specifications, etc., possibly with multiple options for B to choose from.
After receiving A’s negotiation request, Agent B uses natural language to respond to A with B’s capabilities, confirmed protocol specifications, etc., based on the information provided by A.
Agents A and B may go through multiple rounds of negotiation before finally determining the protocol specifications to be used for communication.
Based on the negotiation results, Agents A and B use AI to generate protocol processing code. For security considerations, the generated code should run in a sandbox.
The agents conduct protocol interoperability testing, using AI to determine whether protocol messages conform to the negotiated specifications. If they do not conform, automatic resolution is achieved through natural language interaction.
Finally, the final protocol is determined, and Agents A and B communicate using this protocol.

From the above process, we can see that agents using meta-protocols for protocol negotiation, combined with code generation technology, can greatly improve protocol negotiation efficiency and reduce costs. At the same time, it transforms agent networks into self-organizing, self-negotiating collaborative networks.

Process Message Format Definition

Negotiation messages extend the encryptedData of end-to-end encrypted messages and are considered upper-layer messages of encrypted messages.

The message format after decrypting the ciphertext of the encrypted message’s encryptedData is designed as follows:

0               1               2               3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|PT |  Reserved |              Protocol data                    | ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

PT: Protocol Type, 2 bits, indicating the protocol type
- 00: meta protocol
- 01: application protocol
- 10: natural language protocol
- 11: verification protocol
Reserved: 6 bits, reserved field, currently unused
Protocol Data: Variable length, containing the specific content of the protocol

All messages have a 1-byte binary header, with the primary information being the protocol type of the protocol data:

If the protocol type value is 00, the message is a meta-protocol, used for protocol negotiation.
If the protocol type value is 01, the message is an application protocol, used for actual data transmission.
If the protocol type value is 10, the message is a natural language protocol, using natural language directly for data transmission.
If the protocol type value is 11, the message is a verification protocol, used to verify the negotiated protocol. Once verified, this protocol is used for data transmission. The verification protocol is not actual user data.

The current binary header is one byte. If one byte cannot meet future requirements, it can be extended to multiple bytes. By including message format version information in the Hello message, backward compatibility can be maintained.

Meta-Protocol Negotiation Message Definition

When the protocol type (PT) is 00, the Protocol Data carries meta-protocol messages used to negotiate the communication protocol between two agents. The negotiation process for meta-protocols is predefined and does not require negotiation. This document serves as the predefined document.

We define meta-protocol messages in a semi-structured format. The core protocol negotiation part uses natural language to maintain negotiation flexibility, while structured JSON is used for process control to keep the protocol negotiation process manageable.

Meta-protocol negotiation messages are categorized as follows:

Protocol negotiation messages: Used to negotiate protocol content
Code generation messages: Used to generate protocol processing code
Protocol debugging messages: Used to negotiate debugging protocols
Natural language messages: Used for natural language negotiation between parties

The JSON format used in the protocols below conforms to the JSON specification RFC8259.

Protocol Negotiation Message Definition

The JSON format for protocol negotiation messages is as follows:

{
    "action": "protocolNegotiation",
    "sequenceId": 0,
    "candidateProtocols": "",
    "modificationSummary": "",
    "status": "negotiating"
}

Field descriptions:

action: Fixed as “protocolNegotiation”
sequenceId: Negotiation sequence number, used to identify the negotiation round.
- Starting from 0, each negotiation message’s sequenceId needs to increment by 1.
- To prevent excessive negotiation rounds, code implementers can set an upper limit based on business scenarios. It is recommended not to exceed 10 rounds.
- When processing sequenceId, it is necessary to verify that the sequenceId returned by the other party increments according to the specification.
candidateProtocols: Candidate protocols
- A natural language text describing the purpose, process, data format, error handling, etc., of the candidate protocol.
- This text is generally processed by AI and should use markdown format to maintain clarity and conciseness.
- Candidate protocols can describe the entire protocol content or modify existing protocols by including the URI of existing protocols and the modifications.
- Candidate protocols must include the full protocol content each time.
- When the status is “negotiating,” the candidateProtocols field must be included.
modificationSummary: Protocol modification summary
- A natural language text describing what has been modified in the current candidate protocol compared to the previous one during the negotiation process.
- This text is generally processed by AI and should use markdown format to maintain clarity and conciseness.
- This field can be omitted when initiating negotiation for the first time.
- When the status is “negotiating,” except for the first negotiation, the modificationSummary field should be included.
status: Negotiation status, used to identify the current negotiation state, with the following values:
- negotiating: In negotiation
- rejected: Negotiation failed
- accepted: Negotiation successful

Both negotiating parties can negotiate repeatedly before exceeding the maximum round limit, until either party determines that the protocol provided by the other party meets their needs, resulting in successful negotiation; otherwise, the negotiation fails and can be reported to human engineers to intervene in the negotiation process.

candidateProtocols Example

Example of candidateProtocols carrying full protocol description:

# Requirements
Retrieve product information

# Process Description
The requester includes the product ID or name, sends it to the product provider, and the product provider returns detailed product information based on the product ID or name.
Exception handling:
- Error codes use HTTP error codes
- Error messages use natural language descriptions
- If no response within 15 seconds, considered a timeout

# Data Format Description
Requests and responses use JSON format, following the specification https://tools.ietf.org/html/rfc8259.

## Request Message
Request JSON schema defined as follows:
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ProductInfoRequest",
  "type": "object",
  "properties": {
    "messageId": {
      "type": "string",
      "description": "A random string identifier for the message"
    },
    "type": {
      "type": "string",
      "description": "Indicates whether the message is a REQUEST or RESPONSE"
    },
    "action": {
      "type": "string",
      "description": "The action to be performed"
    },
    "productId": {
      "type": "string",
      "description": "The unique identifier for a product"
    },
    "productName": {
      "type": "string",
      "description": "The name of the product"
    }
  },
  "required": ["messageId", "type", "action", "productId"]
}

## Response Message
Response JSON schema defined as follows:
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ProductInfoResponse",
  "type": "object",
  "properties": {
    "messageId": {
      "type": "string",
      "description": "The messageId from the request json"
    },
    "type": {
      "type": "string",
      "description": "Indicates whether the message is a REQUEST or RESPONSE"
    },
    "status": {
      "type": "object",
      "properties": {
        "code": {
          "type": "integer",
          "description": "HTTP status code"
        },
        "message": {
          "type": "string",
          "description": "Status message"
        }
      },
      "required": ["code", "message"]
    },
    "productInfo": {
      "type": "object",
      "properties": {
        "productId": {
          "type": "string",
          "description": "The unique identifier for a product"
        },
        "productName": {
          "type": "string",
          "description": "The name of the product"
        },
        "productDescription": {
          "type": "string",
          "description": "Description of the product"
        },
        "price": {
          "type": "number",
          "description": "Price of the product"
        },
        "currency": {
          "type": "string",
          "description": "Currency code"
        }
      },
      "required": ["productId", "productName", "price", "currency"]
    }
  },
  "required": ["messageId", "type", "status"]
}

# Test Cases

## Test Case 1

- **Test Request Data**:
{
  "messageId": "msg001",
  "type": "REQUEST",
  "action": "getProductInfo",
  "productId": "P12345"
}

- **Test Response Data**:
{
  "messageId": "msg001",
  "type": "RESPONSE",
  "status": {
    "code": 200,
    "message": "Success"
  },
  "productInfo": {
    "productId": "P12345",
    "productName": "High-performance Laptop",
    "productDescription": "High-performance laptop with the latest processor and large memory capacity.",
    "price": 1299.99,
    "currency": "USD"
  }
}

- **Expected Test Result**:
Successfully retrieved product information, status code 200.

# Test Case 2

- **Test Request Data**:
{
  "messageId": "msg002",
  "type": "REQUEST",
  "action": "getProductInfo",
  "productId": "P99999"
}

- **Test Response Data**:
{
  "messageId": "msg002",
  "type": "RESPONSE",
  "status": {
    "code": 404,
    "message": "Product not found"
  },
  "productInfo": null
}

- **Expected Test Result**:
Product not found, returns status code 404, product information is null.

Error Fixing Negotiation Message Definition

During protocol testing or actual operation, if one party finds that the other party’s message does not conform to the protocol definition or contains errors, they need to notify the other party of the error information and collaborate to fix the error. This process may also go through multiple rounds, as errors in the protocol integration process may exist on both sides and require joint negotiation to fix.

For example, Agent A sends an error fixing message pointing out that Agent B’s message does not comply with the protocol definition or contains errors. After receiving the error fixing message, Agent B analyzes whether it has errors based on the error information and protocol definition. If there are errors, it accepts the errors and modifies the code, entering the code generation process. If there are no errors, it replies with an error fixing message, rejecting the modification and informing Agent A of the detailed reasons.

The JSON format for error fixing negotiation messages is as follows:

{
    "action": "fixErrorNegotiation",
    "errorDescription": "",
    "status": "negotiating"
}

Field descriptions:

action: Fixed as “fixErrorNegotiation”
errorDescription: Error description, a natural language text describing the error information.
status: Negotiation status, used to identify the current negotiation state, with the following values:
- negotiating: In negotiation
- rejected: Negotiation failed
- accepted: Negotiation successful

errorDescription example:

# Error Description
- In the response message, the status field is missing the code field

Natural Language Negotiation Message

Using the protocol negotiation, code generation, error fixing, and other messages defined above, most negotiation processes between agents can be satisfied. Unfortunately, experience tells us that the real world is often very complex with many aspects we may not consider. In the past, it was difficult to solve this problem, but now, based on generative AI and natural language, this problem can be solved well.

Therefore, we have designed a pure natural language interaction message to address issues that cannot be negotiated using our predefined messages.

Natural language negotiation messages are not mandatory messages, and agents can freely choose whether to support them. We recommend prioritizing predefined messages for negotiation, as this makes negotiation messages more efficient.

Natural language interaction messages use a request-response model, with the following JSON format:

{
    "action": "naturalLanguageNegotiation",
    "type": "REQUEST",
    "messageId": "",
    "message": ""
}

Field descriptions:

action: Fixed as “naturalLanguageNegotiation”
type: Message type, used to identify the type of message, value is “REQUEST” or “RESPONSE”
messageId: Message ID, 16-character random string, used to identify the message. When the other party replies, they need to include the same messageId.
message: Natural language content, a natural language text where agents can express their special requirements regarding protocol negotiation, communication, etc.

Application Protocol Message Definition

When the protocol type (PT) is 01, the Protocol Data carries application protocol messages, used to transmit interaction data between two agents. The message format depends on the specific protocol negotiated in the protocol negotiation process. It can be binary data or structured data such as JSON, XML, etc.

Natural Language Protocol Message Definition

When the protocol type (PT) is 10, the Protocol Data carries natural language protocol messages, used to transmit interaction data between two agents.

In certain special cases where agents have minimal, low-frequency, or even one-time interactions, to achieve maximum communication efficiency, the protocol negotiation process can be skipped, and data interaction can be done directly using natural language.

The data in Protocol Data is natural language text encoded in UTF-8. To facilitate AI processing, it is recommended to use markdown format with clear, concise descriptions.

This message is not mandatory, and agents can freely choose whether to support it.

Natural language protocol message examples:

Request example:

# Requirements
Retrieve product information. Based on the product ID, return detailed product information, including product ID, product name, product description, product price, and product currency unit.

# Input
- Product ID: P12345

Response example:

# Output
- Product ID: P12345
- Product Name: High-performance Laptop
- Product Description: High-performance laptop with the latest processor and large memory capacity.
- Product Price: 1299.99
- Product Currency Unit: USD

Verification Protocol Message

When the protocol type (PT) is 11, the Protocol Data carries verification protocol messages, used to transmit verification data between two agents. The message format depends on the specific protocol negotiated in the protocol negotiation process. Verification protocol messages are not actual business data but are used to verify that the protocol process is functioning correctly. The content of verification protocol messages is generally negotiated in the protocol negotiation process through verificationProtocol messages.

This message is not mandatory, and agents can freely choose whether to support it.

Meta-Protocol Capability Negotiation Mechanism and Extensibility Design

The protocol negotiation process described above shows how two agents negotiate a specific protocol, but for various practical reasons, agents may not support all meta-protocol capabilities. For example, some agents may not support natural language protocols, while others may not support verification protocols.

To address this issue, we have designed a meta-protocol capability negotiation mechanism for agents to negotiate meta-protocol capabilities before connecting, informing the other party of their supported meta-protocol capabilities to avoid negotiation failure.

This problem and the extensibility of meta-protocols are essentially the same issue, so they are discussed together. When we need to upgrade the meta-protocol process, such as expanding the protocol type from one byte to two bytes, a new meta-protocol version will be created, requiring consideration of compatibility between new and old versions.

Our solution is to include the meta-protocol version and supported meta-protocol capabilities at that version in the connection handshake messages, namely sourceHello and destinationHello. If one agent supports V1 version and another agent supports V1 and V2 versions, both will use the V1 version meta-protocol for negotiation.

Regarding meta-protocol capability negotiation, we require all agents to support basic meta-protocol capabilities, while optional meta-protocol capabilities, such as natural language protocols and verification protocols, can be freely chosen by agents.

Modifications to sourceHello and destinationHello messages are as follows:

{
  "version": "1.0",
  "type": "sourceHello",  // destinationHello follows the same pattern
  "metaProtocol": {
    "version": "1.0",
    "supportedCapabilities": [
        "naturalLanguageProtocol",
        "verificationProtocol",
        "naturalLanguageNegotiation",
        "testCasesNegotiation",
        "fixErrorNegotiation"
    ],
    "protocolHash": "1234567890abcdef..."
  },
  // Other fields omitted
}

Field descriptions:

version: Meta-protocol version, currently 1.0
supportedCapabilities: Supported meta-protocol capabilities, array type, each element in the array is the name of a supported meta-protocol capability, corresponding to the following functionalities:
- naturalLanguageProtocol: Natural language protocol
- verificationProtocol: Verification protocol
- naturalLanguageNegotiation: Natural language negotiation
- testCasesNegotiation: Test case negotiation
- fixErrorNegotiation: Error fixing negotiation

Meta-Protocol Negotiation Efficiency Optimization

While agents using meta-protocols to negotiate data transmission protocols can solve the human cost problem of protocol integration between heterogeneous systems and allow agents to form a self-organizing, self-negotiating network, it also introduces some new issues.

First, the protocol negotiation process significantly increases communication Round-Trip Time (RTT), and using AI to process natural language also introduces new latency. From protocol negotiation, code generation, test case negotiation (optional), to error fixing negotiation (optional), the entire process adds at least 2 RTTs, and more RTTs in cases of multiple rounds of negotiation. Using LLMs to process natural language also creates new latency, depending on the length of input and output and the speed of model processing, which can range from a few seconds to tens of seconds per instance.

Second, the negotiation process depends on AI’s ability to understand requirements, design protocols, and generate protocol processing code. These capabilities require high-quality AI, and due to inherent deficiencies in current AI, such as LLM hallucination issues, AI cannot guarantee 100% success, which reduces the negotiation success rate.

In actual business processes, one function may involve many agent interactions, and if latency and success rate issues are not well addressed, they will seriously affect the user experience.

To this end, we have designed a 0-RTT meta-protocol negotiation mechanism and a consensus protocol-based meta-protocol negotiation mechanism to address the above issues.

0-RTT Meta-Protocol Negotiation Mechanism

In modern communication protocol design, to reduce connection process RTT, a 0-RTT communication mechanism is typically designed. For example, TLS 1.3 allows clients to use a previously negotiated key to encrypt data in the first message of a new connection, achieving immediate data transmission.

ANP’s 0-RTT meta-protocol negotiation mechanism involves completing the full meta-protocol negotiation process during the first connection between two agents, with both parties caching the negotiated protocol, saving both the protocol content and its corresponding hash value. The protocol content uses the candidateProtocols field from the protocolNegotiation message at the time agreement is reached.

During the second connection, the previously negotiated result can be directly reused for communication. In handshake message design, the connection initiator can include the result of the previous negotiation, mainly the protocol hash value, in the sourceHello message. The connection receiver confirms the initiator’s protocol in the destinationHello message, allowing both parties to skip the negotiation process and communicate using the previously negotiated protocol.

sourceHello message example:

{
  "version": "1.0",
  "type": "sourceHello",  
  "metaProtocol": {
    "version": "1.0",
    "supportedCapabilities": [
        "naturalLanguageProtocol",
        "verificationProtocol",
        "naturalLanguageNegotiation",
        "testCasesNegotiation",
        "fixErrorNegotiation"
    ],
    "candidateProtocols": [
        "https://example.com/protocol/1.0",
        "https://example.com/protocol/2.0"
    ]
  },
  // Other fields omitted
}

destinationHello message example:

{
  "version": "1.0",
  "type": "destinationHello",  
  "metaProtocol": {
    "version": "1.0",
    "supportedCapabilities": [
        "naturalLanguageProtocol",
        "verificationProtocol",
        "naturalLanguageNegotiation",
        "testCasesNegotiation",
        "fixErrorNegotiation"
    ],
    "selectedProtocol": "https://example.com/protocol/1.0"
  },
  // Other fields omitted
}

Field descriptions:

selectedProtocol: Protocol selected from candidateProtocols, used to identify the protocol to be used by both parties.

Achieving Agent Network Consensus Protocols

How agent networks achieve consensus protocols and publish them to the network will be discussed in other specifications.

Future

This specification primarily discusses the design of meta-protocols and meta-protocol negotiation mechanisms. We have designed a more flexible, lower-cost agent protocol negotiation specification. Using this specification, agents can complete autonomous negotiation, code generation, debugging, and communication between agents without human participation, laying a solid foundation for self-organizing, self-negotiating agent networks.

At the same time, we believe that with the support of meta-protocols, many consensus communication protocols will emerge between agents in the agent network, and the number of these protocols will far exceed the number of protocols defined by humans.

However, how to design a reasonable protocol election consensus algorithm, how to incentivize agents to report their negotiated consensus protocols, and how to enable agents to easily access consensus protocols negotiated by other agents still require further discussion.

Challenges

If AI cannot well integrate protocol code with application business logic code and data processing code, and only processes protocol formats, the role of meta-protocols will be much smaller.

Copyright Notice

Copyright (c) 2024 GaoWei Chang
This file is published under the MIT License, you are free to use and modify it, but you must retain this copyright notice.