ANP Technical White Paper

Agent Network Protocol Framework: A Protocol Framework for Agent Communication

Abstract

Although the current internet infrastructure is quite mature, there is still a lack of standardized communication and network connection solutions for the specific needs of agent networks. To fully leverage the potential of artificial intelligence, this paper proposes a protocol framework for agent communication—Agent Network Protocol Framework. This framework aims to eliminate information barriers, achieve convenient and decentralized identity authentication between agents, and enable efficient collaboration. The framework consists of a three-layer structure: Identity and Encrypted Communication Layer, Meta-Protocol Layer, and Application Protocol Layer. The Identity and Encrypted Communication Layer, based on the W3C DID standard, provides decentralized identity authentication and end-to-end encrypted communication, ensuring secure connections between agents. The Meta-Protocol Layer enhances collaboration flexibility and efficiency while reducing communication costs through natural language negotiation and AI code generation. The Application Protocol Layer simplifies the interaction process between agents through standardized protocol descriptions and management. This paper focuses on the overall design of this protocol framework, providing an innovative solution for agent communication.

1. Introduction

With the rapid development of agent technology, it is gradually becoming the next important platform after Android and iOS¹. However, there is still a lack of a standardized solution for communication and network connection between agents. Despite the maturity of internet infrastructure, existing technologies cannot fully meet the special needs of agent networks, mainly in the following aspects:

First, agents typically need comprehensive access to user information to make accurate decisions. However, the data silo effect in the existing internet causes user information to be scattered across different platforms, limiting the functional capabilities of agents. Second, current internet applications are primarily designed for human users and rely on graphical interfaces, whereas agents are better at directly processing underlying data through protocols or APIs. Graphical interfaces not only increase development costs but also reduce processing efficiency. Finally, agents have the ability to use natural language for network connections and negotiations, and can achieve more personalized and efficient communication through self-organization and self-collaboration.

Therefore, there is an urgent need for a new protocol framework that can break down data barriers and achieve seamless connection and communication between agents. This framework should have the following characteristics: eliminate information barriers so that agents can make decisions in a complete context; provide data interfaces suitable for AI, reducing information processing costs; support agents’ autonomous connection, negotiation, and collaboration. To this end, this paper proposes a three-layer protocol framework, including Identity and Encrypted Communication Layer, Meta-Protocol Layer, and Application Protocol Layer, aimed at addressing the challenges in agent communication.

2. Protocol Three-Layer Architecture

To address issues such as identity authentication, protocol negotiation, and application interaction in agent networks, we have designed a protocol architecture consisting of three layers, as shown in the figure below:

Protocol Layer Diagram

Identity and Encrypted Communication Layer: This layer defines a series of specifications aimed at solving identity authentication problems between agents, especially cross-platform identity authentication. We have designed a decentralized identity authentication scheme based on W3C DID ², providing end-to-end encrypted communication, ensuring that agents on any two platforms can securely authenticate each other.
Meta-Protocol Layer: This layer defines how agents use natural language for negotiation, including communication protocol negotiation and joint debugging. Through the flexibility of natural language, agents can dynamically adjust communication protocols to adapt to different interaction needs.
Application Protocol Layer: This layer defines how agents describe their capabilities and supported protocols, as well as how to load and process protocol code. Through standardized protocol descriptions, agents can interact and collaborate more efficiently.

2.1 Identity and Encrypted Communication Layer

To achieve interconnection between all agents, the primary task is to solve the identity authentication problem between agents. Currently, most internet applications use centralized identity technologies, and different technical implementations make it difficult for accounts across systems to authenticate each other. Although OAuth2.0 technology has alleviated this problem to some extent³, OAuth2.0 was not specifically designed for cross-system identity authentication, its process is relatively complex, and it has limitations in terms of decentralization. Therefore, there is an urgent need for a convenient, cross-platform, and decentralized identity authentication technology.

While blockchain-based decentralized identity authentication schemes provide possible solutions, they have not yet become the optimal solution due to the scalability challenges of blockchain in large-scale applications.

To address these issues, we introduce the W3C Decentralized Identifier (DID) standard². DID is a new type of identifier standard designed to solve the dependency of traditional centralized identity management systems. It allows users to take control of their own identities and authenticate each other without relying on centralized systems. The core DID specification does not require implementers to use specific computational infrastructure to build decentralized identifiers, allowing us to fully leverage existing mature technologies and well-established Web infrastructure to build DIDs. Additionally, various types of identifier systems can add support for DIDs, thereby building interoperable bridges between centralized, federated, and decentralized identifier systems. This means that existing centralized identifier systems do not need to be completely restructured; they only need to create DIDs on their foundation to achieve cross-system interoperability, greatly reducing the difficulty of technical implementation.

DID as an identity authentication bridge between different systems

The core component of DID is the DID document, which contains key information related to a specific DID, used to verify the identity of the DID owner, and supports the management of operations, permissions, and access control related to the DID.

DID architecture overview and basic component relationships

In the authentication process, the DID document contains methods and corresponding public keys for verifying user identity (private keys are kept by the user). The client can include the DID and signature in the HTTP header during the first HTTP request. Without increasing the number of interactions, the server can quickly verify the client’s identity using the public key in the DID document. After the first verification, the server can return a token, and the client carries this token in subsequent requests. The server doesn’t need to verify the client’s identity each time but only needs to verify the token. The core of the entire process is that the verifier uses trusted public keys to verify user signature information and can complete identity authentication, permission authentication, data exchange, and other operations in one request, making the process concise and efficient.

sequenceDiagram
    participant Agent A Client
    participant Agent B Server 
    participant Agent A DID Sever

    Note over Agent A Client,Agent B Server: First Request

    Agent A Client->>Agent B Server: HTTP Request: DID,Signature
    Agent B Server->>Agent A DID Sever: Get DID Document
    Agent A DID Sever->>Agent B Server: DID Document

    Note over Agent B Server: Authentication

    Agent B Server->>Agent A Client: HTTP Response: token

    Note over Agent A Client, Agent B Server: Subsequent Requests

    Agent A Client->>Agent B Server: HTTP Request: token
    Agent B Server->>Agent A Client: HTTP Response

DID methods define how to create, resolve, update, and deactivate DIDs and DID documents, and how to perform authentication and authorization. Among existing DID method drafts, the did:web method⁴ is built on mature Web technologies, allowing systems to use centralized technologies (such as cloud computing) to create, update, and deactivate DIDs and DID documents. Different systems achieve interoperability through the HTTP protocol, similar to email services on the internet, where each platform implements its own account system in a centralized way while enabling interconnection between platforms.

Based on the did:web method, we have proposed a new DID method—did:wba (Web-Based Agent)—by adding cross-platform identity authentication processes and agent description services specifically for agent communication scenarios. The did:wba method inherits the advantages of did:web and further optimizes the identity authentication mechanism between agents, enhancing its applicability in agent networks.

Additionally, users typically create one or more public-private key pairs for their DIDs, which are not only used for identity verification but also for end-to-end encrypted communication. Based on DID public-private key pairs, we use the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) protocol⁵ to design an end-to-end encrypted communication scheme, achieving secure communication between two DIDs and ensuring that intermediate nodes cannot decrypt the communication content.

In the DID document, we introduce a dedicated verification method humanAuthorization to distinguish between human authorization and agent automatic authorization. For low-risk, routine requests (such as querying public information), user agents can automatically authorize on behalf of users; for high-risk, important requests involving privacy or property (such as payment transactions), explicit human authorization is required. When user agents initiate important requests, they need to sign using the humanAuthorization method, which requires the agent to first request authorization from the human user and execute the operation only after confirmation. Agent developers need to securely store relevant private keys and implement strict permission isolation measures, such as verifying user identity through biometric verification before using the private key for signing.

To protect user privacy, we recommend adopting a multi-DID strategy. This includes generating a relatively stable main DID for users for long-term social relationships, while generating multiple sub-DIDs for different application scenarios (such as shopping, ordering meals, etc.). Each DID has different roles and permissions, uses different key pairs, and achieves fine-grained permission control. Additionally, regularly deactivating expired sub-DIDs and applying for new DIDs can effectively enhance privacy security protection levels, preventing users from being tracked and analyzed across platforms. Agents should follow the principle of minimal information disclosure during communication, transmitting only the information necessary to complete the task.

2.2 Meta-Protocol Layer

A Meta-Protocol is a high-level protocol that defines the rules for operating, parsing, combining, and interacting with communication protocols. Essentially, it is a protocol for negotiating communication protocols, not directly handling specific data transmission, but providing a flexible, general, and extensible communication framework.

Currently, there are two main methods for communication between agents:

Human engineers design communication protocols: Such as common industry standards. Human engineers design communication protocols for agents, develop protocol code, and perform debugging, testing, and deployment. However, this method often faces issues such as high development costs, slow protocol updates and iterations, and difficulty adapting to new scenarios.
Agents directly use natural language for communication: Agents communicate using natural language and internally use large language models (LLMs) to process natural language data. But this method has issues such as high data processing costs and low processing accuracy.

To solve these problems, we can use a meta-protocol and AI code generation approach. By using meta-protocols and leveraging AI code generation technology, we can significantly improve communication efficiency between agents, reduce communication costs, while maintaining communication flexibility and personalization⁶.

The basic process of communication using a meta-protocol is as follows:

Meta-protocol request: Agent A first sends a meta-protocol request to Agent B. The request body uses natural language to describe its needs, inputs, expected outputs, and proposes candidate communication protocols. Candidate communication protocols generally include transport layer protocols, data formats, data processing methods, etc.
Protocol negotiation: After receiving the meta-protocol request, Agent B uses AI to process the natural language description in the request and, combined with its own capabilities, determines whether to accept A’s request and candidate protocols. If B’s capabilities cannot meet A’s request, it directly rejects; if B does not accept A’s candidate protocols, it can propose its own candidate protocols, entering the next round of negotiation. The negotiation process continues until both parties reach an agreement or the negotiation fails.
Code generation and deployment: After reaching an agreement, each party generates protocol processing code based on the negotiated protocol and deploys it.
Joint debugging: After code deployment, both parties negotiate test data for joint debugging and testing of the protocol and the AI-generated protocol processing code.
Formal communication: After joint debugging is completed, the protocol is officially launched. Thereafter, Agent A and Agent B begin to communicate using the finally negotiated protocol and process data using AI-generated code.
Requirement change handling: If requirements change, the above process is repeated until both parties reach an agreement again.

Basic process of meta-protocol communication

However, the meta-protocol negotiation process is time-consuming and depends on AI code generation capabilities. If meta-protocol negotiation is conducted for each communication, it will result in enormous cost consumption and a poor interaction experience. Given that there are many identical or similar communication processes between agents, agents can save the results of meta-protocol negotiations. When similar needs arise later, they can directly use previous negotiation results as formal protocols for communication or as candidate protocols for negotiation. At the same time, agents can also share negotiation results for other agents to query and use.

How to economically incentivize agents to actively upload negotiation results and select consensus protocols between agents is an issue that still needs in-depth research at the meta-protocol layer.

2.3 Application Protocol Layer

To reduce communication costs and improve interaction experience, in most communication scenarios, agents should avoid negotiating communication protocols through meta-protocols each time. Therefore, at the application protocol layer, we will design a series of specifications based on semantic web-related standards, including Agent Capability and Supported Protocol Description and Application Protocol Management Specification, making communication between agents more convenient, efficient, and cost-effective.

The Agent Capability and Supported Protocol Description Specification clarifies how agents describe their own capabilities and supported protocols, as well as the protocol information needed to call these capabilities. Agents can publish these description documents on the internet or specialized agent search services for other agents to query and call.

In designing the agent description specification, we used many semantic web standards, including RDF (Resource Description Framework), JSON-LD (JSON Linked Data), schema.org, etc. Reusing these technologies can enhance the consistency of data understanding between two agents.

The Application Protocol Management Specification stipulates the document format of application protocols, metadata (such as protocol version, release time, creator, etc.), and methods for querying and downloading protocol documents. Application protocols should include the following content:

Protocol version: Indicates the iteration and update information of the protocol, ensuring that both parties use compatible protocol versions.
Functionality description: Detailed explanation of the protocol’s functionality, applicable scenarios, and expected effects.
Input and output data formats: Defines the format, type, and constraints of data in the interaction process.
Protocol processing flow: Describes the steps, sequence, and logical relationships of communication.
Protocol code signed by trusted DIDs: Includes code for the requestor to initiate requests and for the responder to process requests, ensuring the security and trustworthiness of the code.

The sources of application protocols can be diverse:

Standard protocols defined by humans: Formulated by domain experts or industry organizations, with broad applicability and consistency.
Consensus protocols negotiated between agents: Agents reach agreements through meta-protocol negotiation, applicable to specific collaborative tasks.
Personalized protocols between agents: Customized protocols based on specific needs or scenarios.

To facilitate the sharing and reuse of application protocols, a protocol service platform similar to PyPI could be established in the future for centralized management of application layer protocols. Agents can search, download, and use existing protocols and their code on this platform to provide external services. When calling services of other agents, they can load the corresponding protocol code and communicate based on the protocols supported by the other party.

The following is an example flow of Agent A calling Agent B’s service:

Capability discovery: Agent A discovers that Agent B has capabilities that meet its needs through search or query services.
Protocol matching: A reviews B’s capability description document to determine available communication protocols.
Protocol loading: A loads the corresponding protocol processing code through the protocol service platform.
Communication execution: A uses the loaded protocol code to communicate with B according to the specified process.

Agent A calling Agent B's service

In specific data exchanges, the data format of the protocol is not restricted and can be chosen according to needs, such as JSON, OpenAPI, Protocol Buffers, and other industry standard formats, to meet the requirements of different application scenarios.

3. Future Perspectives

Although our proposed three-layer protocol architecture has addressed key issues in agent network communication to some extent, there are still several areas that require further research.

First, optimization of cross-platform identity authentication technology is an urgent issue. Although the W3C DID standard provides a possibility for decentralized identity authentication, as a newly published recommended standard, its infrastructure is not yet mature.

Second, the applicability of underlying communication protocols needs to be reevaluated. Our solution relies on existing Web infrastructure, which reduces the difficulty of technical implementation but may overlook the special needs of agent communication. Is the existing HTTP protocol still the best choice for agents? Are there protocol solutions more suitable for agent data exchange and communication efficiency? These questions are worth further discussion.

Finally, the application prospects of blockchain technology in agent networks are promising. As blockchain technology gradually matures, its inherent decentralized nature and emphasis on personal data sovereignty may provide an ideal foundation for building agent networks. Blockchain not only can facilitate easier access to user data for AI but its intrinsic financial attributes may also solve the economic incentive challenges agents face when negotiating protocols.

Our research provides a foundational framework for agent network communication, but there is still much work to be done in aspects such as identity authentication, communication protocols, and technology selection. Future research should address these key issues and propose more innovative and practical solutions.

4. Conclusion

This paper proposes a three-layer protocol architecture for agent network communication, aimed at addressing the current lack of standardized communication and network connection solutions between agents. First, by introducing a decentralized identity authentication scheme based on W3C DID, we provide agents with cross-platform identity authentication capabilities and design end-to-end encrypted communication mechanisms, ensuring the security and trustworthiness of communication. Second, at the meta-protocol layer, we leverage the capabilities of natural language negotiation and AI code generation to enhance communication efficiency and flexibility between agents, reducing the complexity and cost of protocol negotiation. Finally, the application protocol layer simplifies the interaction process between agents, reduces communication costs, and improves interaction experience through standardized protocol description and management.

Although this architecture has made significant progress in solving agent communication problems, there are still some challenges to overcome. For example, how to further optimize cross-platform identity authentication technology to enhance its scalability and practicality; explore underlying protocols more suitable for agent communication to improve the efficiency and reliability of data exchange. Additionally, the application potential of blockchain technology in agent networks is also worth in-depth research, especially in decentralized identity management and economic incentive mechanisms.

In conclusion, this research provides an innovative solution for agent network communication, laying a good foundation. Future work will focus on perfecting this architecture, solving existing problems, and promoting the further development of agent technology.

References

Copyright Notice

Copyright (c) 2024 GaoWei Chang
This file is published under the MIT License, you are free to use and modify it, but you must retain this copyright notice.

Bill Gates, AI is about to completely change how you use computers, https://www.gatesnotes.com/AI-agents ↩︎
Decentralized Identifiers (DIDs) v1.0: Core architecture, data model, and representations https://www.w3.org/TR/did-core/ ↩︎ ↩︎
The OAuth 2.0 Authorization Framework, https://tools.ietf.org/html/rfc6749 ↩︎
did:web Method Specification, https://w3c-ccg.github.io/did-method-web/ ↩︎
The Transport Layer Security (TLS) Protocol Version 1.3, https://www.rfc-editor.org/rfc/rfc8446.html ↩︎
A Scalable Communication Protocol for Networks of Large Language Models, https://arxiv.org/pdf/2410.11905 ↩︎