August 17, 2023
Portfolio
Unusual

Autonomous AI agents could change the world, but what do they actually do well?

Autonomous AI agents could change the world, but what do they actually do well?Autonomous AI agents could change the world, but what do they actually do well?
All posts
Editor's note: 

Autonomous agents built using LLMs are the new hot entrant to the 2023 AI innovation scene. But beyond their novelty value, what exactly can autonomous agents do well and how can the industry leverage them?

These are the questions we dug into with Stanford PhD researcher Joon Sung Park during Unusual AI Studio, our new accelerator program for Bay Area AI builders. Park’s team at Stanford got us excited in April with their paper "Generative Agents: Interactive Simulacra of Human Behavior," stirring up questions about what’s next at the intersection of human-computer interaction, NLP, and autonomous AI. Park and his team of Google and Stanford researchers created a game-changing pixel art universe with 25 NPCs (non-player characters) whose actions were guided by ChatGPT and an agent architecture that stored, synthesized, and applied relevant memories to generate believable behavior. 

What happens when you put 25 AI agents together in an RPG town? This is a figure from Joon Sung Park's paper on ChatGPT-powered software, illustrating agents’ conversations. The experiment showed how generative agents can communicate with others, form opinions, pursue goals, and even build romantic relationships.

Not surprisingly, Park’s experiment ignited simultaneous excitement and concern about what’s possible for the technology as it evolves. Some argued that ChatGPT agents are better at simulated role-play than humans. And while generative agents can perform loosely defined ambiguous tasks, they aren’t adept at performing well-defined tasks where there’s a clear state machine that you want to control. 

Needless to say, autonomous agents are just getting warmed up. With a keen interest in improving accuracy in simulated behavior and scaling up multi-agent systems, Joon’s team continues to push boundaries in AI research, working toward integrating his innovative technology with future iterations of advanced models like GPT-4.

As the field of AI continues to advance, the development of generative behavioral agents holds immense potential for various industries, particularly gaming and customer service. The ability to simulate human behavior opens doors to training AI systems, understanding social phenomena, and fostering interactive experiences. However, as we explore this transformative technology, it’s crucial to establish ethical norms, evaluate the impact on human roles, and ensure that AI systems complement and augment human capabilities rather than replacing them. Once this technology becomes available to the public, some believe the line between reality and simulation may become increasingly blurred.

How do generative agents work?


Generative agents are computational software agents that simulate believable human behavior. Generative agents use an architecture that consists of three main components: observation, planning, and reflection. As Park and his team have written about the observed behavior from their experiment, “Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day.” 

To enable generative agents, Park’s team described an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. 

What does it mean for an AI agent to be believable and human-like?


The concept of being “human-like” is a complex and difficult problem to define and evaluate. Being human-like means exhibiting behaviors that resemble those of a human, such as embellishing, exaggerating, and developing unique habits.

AI can be used to generate simulations of human behavior by training language models on large datasets to mimic human responses and actions. OpenAI's language models (LLMs) have revolutionized the way NPCs are developed. Previously, researchers relied on cognitive architectures to map human brain functions, but this approach lost traction due to the complexity of manually authoring behaviors. However, LLMs like GPT-3 provide a breakthrough by allowing the generation of reasonable responses based on specific prompts and contexts. These models enable social interactions and complex behaviors between NPCs, providing more believability and adaptability. Game developers have started recognizing the potential of LLMs for tackling the NPC problem, as they make it possible to generate reasonable responses regardless of the prompt.

What are the biggest challenges in scaling the use of AI in multi-agent environments?

One major challenge is the inference time of the models. While models like ChatGPT have improved in speed, they still take time to process information. When dealing with a large number of agents, there can be significant latency in real-time interactions. Optimizations and fine-tuning will be necessary to make the models faster and more efficient. The main challenge is the inference time of these models. Chat GPT has gotten a lot faster, but it still takes time. In Park’s case, with 25 agents, it's not fast and cheap enough to run real-time interaction.

Balancing control and autonomy: Will autonomous agents ever be reliable? 


The biggest challenge preventing widespread adoption of agents is their reliability. Given that the underlying technology is non-deterministic, agents don’t reliably deliver on hard-edged or well-defined objective functions. This is in direct contrast to an earlier generation of AI. For instance, when Deepmind’s AlphaGo beat a human being at Go in 2015, it was trained to maximize a single goal — playing Go. There is a clear tradeoff between these ML approaches: LLM-based agents can perform a broad array of tasks but might not be reliably better than human beings at any of them. 

This is less of a challenge for gaming applications. For example, in video games, non-player characters (NPCs) could provide enriched experiences for players while still broadly following the game narrative. However, in personal assistant applications, agents would need to be able to make reliable and somewhat deterministic decisions on behalf of users — which is not something we can do yet. 

 

Finding the right balance between autonomy and control depends on the specific application space. Park says one of the main challenges with generative agents is that they need a very clear objective function. Reinforcement learning agents, like those used in chess, are examples of agents that operate in a controlled space with a clear objective function. There is a spectrum of control when it comes to AI agents, ranging from complete control (like a tool) to no control (like an independent entity). The paradigm of setting clear goals and objectives for agents may be a way to help control agent behavior. 

Tasks with soft edges, where there isn't a clear definition of the optimal solution, are more suitable for AI agents. In games, for example, there may be multiple acceptable ways to behave, and users are forgiving of variations. However, tasks with hard edges, where specific actions or results are expected, can be more challenging for agents. Ordering pizza, for instance, can be tricky if the agent doesn't have access to the right API or makes mistakes that result in unhappy customers. It may take time to successfully handle hard-edge tasks.

While there is an opportunity to improve AI agents' capabilities in hard-edge tasks, it’s unlikely to happen overnight. It may take several years of development and refinement before agents can reliably handle tasks with clear expectations and defined results. Starting with softer edge tasks and gradually expanding into harder edge tasks seems like a more practical approach in the near term.

Impact of fine-tuning on agent behavior? 

Fine-tuning models can indeed help in constraining agent behavior. OpenAI has used fine-tuning to align their models, such as ChatGPT, with human preferences to ensure safe and useful interactions. However, fine-tuning can also limit the ability to produce certain human-like behaviors. Models like ChatGPT, which are heavily fine-tuned based on human preferences, may lose the ability to exhibit certain human behaviors, such as generating conflict or being less formal. This trade-off between believability and safety is a challenge that the AI community must navigate.

There is a potential future for open source models that prioritize believability. Models that are less fine-tuned and more open to community input could play a role in achieving believable human behavior. However, defining community standards and managing the associated risks will be crucial in pushing for believability in AI agents.

Read more

Create what you dream: The emerging Generative AI video and 3D landscape

Whose responsibility is responsible AI?

The race for identity verification and onboarding is on! How will Generative AI make an impact?

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

All posts
August 17, 2023
Portfolio
Unusual

Autonomous AI agents could change the world, but what do they actually do well?

Editor's note: 

Autonomous agents built using LLMs are the new hot entrant to the 2023 AI innovation scene. But beyond their novelty value, what exactly can autonomous agents do well and how can the industry leverage them?

These are the questions we dug into with Stanford PhD researcher Joon Sung Park during Unusual AI Studio, our new accelerator program for Bay Area AI builders. Park’s team at Stanford got us excited in April with their paper "Generative Agents: Interactive Simulacra of Human Behavior," stirring up questions about what’s next at the intersection of human-computer interaction, NLP, and autonomous AI. Park and his team of Google and Stanford researchers created a game-changing pixel art universe with 25 NPCs (non-player characters) whose actions were guided by ChatGPT and an agent architecture that stored, synthesized, and applied relevant memories to generate believable behavior. 

What happens when you put 25 AI agents together in an RPG town? This is a figure from Joon Sung Park's paper on ChatGPT-powered software, illustrating agents’ conversations. The experiment showed how generative agents can communicate with others, form opinions, pursue goals, and even build romantic relationships.

Not surprisingly, Park’s experiment ignited simultaneous excitement and concern about what’s possible for the technology as it evolves. Some argued that ChatGPT agents are better at simulated role-play than humans. And while generative agents can perform loosely defined ambiguous tasks, they aren’t adept at performing well-defined tasks where there’s a clear state machine that you want to control. 

Needless to say, autonomous agents are just getting warmed up. With a keen interest in improving accuracy in simulated behavior and scaling up multi-agent systems, Joon’s team continues to push boundaries in AI research, working toward integrating his innovative technology with future iterations of advanced models like GPT-4.

As the field of AI continues to advance, the development of generative behavioral agents holds immense potential for various industries, particularly gaming and customer service. The ability to simulate human behavior opens doors to training AI systems, understanding social phenomena, and fostering interactive experiences. However, as we explore this transformative technology, it’s crucial to establish ethical norms, evaluate the impact on human roles, and ensure that AI systems complement and augment human capabilities rather than replacing them. Once this technology becomes available to the public, some believe the line between reality and simulation may become increasingly blurred.

How do generative agents work?


Generative agents are computational software agents that simulate believable human behavior. Generative agents use an architecture that consists of three main components: observation, planning, and reflection. As Park and his team have written about the observed behavior from their experiment, “Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day.” 

To enable generative agents, Park’s team described an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. 

What does it mean for an AI agent to be believable and human-like?


The concept of being “human-like” is a complex and difficult problem to define and evaluate. Being human-like means exhibiting behaviors that resemble those of a human, such as embellishing, exaggerating, and developing unique habits.

AI can be used to generate simulations of human behavior by training language models on large datasets to mimic human responses and actions. OpenAI's language models (LLMs) have revolutionized the way NPCs are developed. Previously, researchers relied on cognitive architectures to map human brain functions, but this approach lost traction due to the complexity of manually authoring behaviors. However, LLMs like GPT-3 provide a breakthrough by allowing the generation of reasonable responses based on specific prompts and contexts. These models enable social interactions and complex behaviors between NPCs, providing more believability and adaptability. Game developers have started recognizing the potential of LLMs for tackling the NPC problem, as they make it possible to generate reasonable responses regardless of the prompt.

What are the biggest challenges in scaling the use of AI in multi-agent environments?

One major challenge is the inference time of the models. While models like ChatGPT have improved in speed, they still take time to process information. When dealing with a large number of agents, there can be significant latency in real-time interactions. Optimizations and fine-tuning will be necessary to make the models faster and more efficient. The main challenge is the inference time of these models. Chat GPT has gotten a lot faster, but it still takes time. In Park’s case, with 25 agents, it's not fast and cheap enough to run real-time interaction.

Balancing control and autonomy: Will autonomous agents ever be reliable? 


The biggest challenge preventing widespread adoption of agents is their reliability. Given that the underlying technology is non-deterministic, agents don’t reliably deliver on hard-edged or well-defined objective functions. This is in direct contrast to an earlier generation of AI. For instance, when Deepmind’s AlphaGo beat a human being at Go in 2015, it was trained to maximize a single goal — playing Go. There is a clear tradeoff between these ML approaches: LLM-based agents can perform a broad array of tasks but might not be reliably better than human beings at any of them. 

This is less of a challenge for gaming applications. For example, in video games, non-player characters (NPCs) could provide enriched experiences for players while still broadly following the game narrative. However, in personal assistant applications, agents would need to be able to make reliable and somewhat deterministic decisions on behalf of users — which is not something we can do yet. 

 

Finding the right balance between autonomy and control depends on the specific application space. Park says one of the main challenges with generative agents is that they need a very clear objective function. Reinforcement learning agents, like those used in chess, are examples of agents that operate in a controlled space with a clear objective function. There is a spectrum of control when it comes to AI agents, ranging from complete control (like a tool) to no control (like an independent entity). The paradigm of setting clear goals and objectives for agents may be a way to help control agent behavior. 

Tasks with soft edges, where there isn't a clear definition of the optimal solution, are more suitable for AI agents. In games, for example, there may be multiple acceptable ways to behave, and users are forgiving of variations. However, tasks with hard edges, where specific actions or results are expected, can be more challenging for agents. Ordering pizza, for instance, can be tricky if the agent doesn't have access to the right API or makes mistakes that result in unhappy customers. It may take time to successfully handle hard-edge tasks.

While there is an opportunity to improve AI agents' capabilities in hard-edge tasks, it’s unlikely to happen overnight. It may take several years of development and refinement before agents can reliably handle tasks with clear expectations and defined results. Starting with softer edge tasks and gradually expanding into harder edge tasks seems like a more practical approach in the near term.

Impact of fine-tuning on agent behavior? 

Fine-tuning models can indeed help in constraining agent behavior. OpenAI has used fine-tuning to align their models, such as ChatGPT, with human preferences to ensure safe and useful interactions. However, fine-tuning can also limit the ability to produce certain human-like behaviors. Models like ChatGPT, which are heavily fine-tuned based on human preferences, may lose the ability to exhibit certain human behaviors, such as generating conflict or being less formal. This trade-off between believability and safety is a challenge that the AI community must navigate.

There is a potential future for open source models that prioritize believability. Models that are less fine-tuned and more open to community input could play a role in achieving believable human behavior. However, defining community standards and managing the associated risks will be crucial in pushing for believability in AI agents.

Read more

Create what you dream: The emerging Generative AI video and 3D landscape

Whose responsibility is responsible AI?

The race for identity verification and onboarding is on! How will Generative AI make an impact?

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.