The Random Walk Blog

2025-04-28

I Built an AI Agent From Scratch—Here’s What I Learned

I Built an AI Agent From Scratch—Here’s What I Learned

I’ve worked with LangChain. I’ve played with LlamaIndex. They’re great—until they aren’t.

At some point, I realized I was using tools that made things feel more magical than understandable. So I asked myself: what if I built an AI agent from scratch? No framework glue. No black-box abstractions. Just code, logic, and raw capability.

Here’s what happened—and what I learned along the way.

Meet the Use Case: A Restaurant Recommendation Agent

I wanted a real-world use case. So I picked something familiar yet non-trivial: a restaurant recommendation + reservation agent.

It should:

  • Understand what users want ("table for 4 tomorrow at 7 PM")

  • Ask clarifying questions

  • Recommend places

  • Book reservations

That meant building an actual AI agent—not just a chatbot.

Getting Into Agent Mode

Agents aren't just text predictors—they act, plan, and adapt. Mine needed:

  • A brain: I used LLaMA 3.1 locally.

  • A memory: Something to track past turns.

  • Tools: To perform tasks like suggesting restaurants or making bookings.

  • A controller: The logic to decide what to do next.

  • And a UI: Even a simple terminal would do.

I wasn’t just writing code—I was designing cognition.

Building Our Restaurant Agent

Let's examine how each component is implemented in our example restaurant agent:

1. Setting Up the Model

# Using Llama 3.1-8B locally
# In main.py, we initialize the model

When building from scratch, you have the flexibility to choose any model and integration method:

  • Local models (Llama, Mistral, etc.)

  • API-based models (OpenAI, Anthropic, etc.)

  • Custom fine-tuned models

The key is setting up a consistent interface for sending prompts and receiving responses.

2. Implementing Context Management

Maintaining conversation history is crucial for coherent interactions:

Simple context implementation
class ConversationContext:
    def __init__(self, max_turns=10):
        self.history = []
        self.max_turns = max_turns
    
    def add_exchange(self, user_input, agent_response):
        self.history.append({"user": user_input, "agent": agent_response})
        # Trim if exceeding max context
        if len(self.history) > self.max_turns:
            self.history = self.history[1:]
    
    def get_formatted_context(self):
        formatted = ""
        for exchange in self.history:
            formatted += f"User: {exchange['user']}\nAgent: {exchange['agent']}\n\n"
        return formatted

This simple context manager tracks conversation turns and formats them for inclusion in prompts.

3. Defining Tools

Tools are functions that extend the agent's capabilities beyond just conversation:

def find_intent(user_input, context):
    # Prompt the LLM to identify user intent
    prompt = f"""
    Based on the following user input and conversation context, 
    identify the primary intent (RECOMMEND_RESTAURANT, MAKE_RESERVATION, or GENERAL_INQUIRY):
    
    Context: {context}
    User input: {user_input}
    
    Intent:
    """
    return llm(prompt).strip()

def suggest_restaurant(preferences, context):
    # Tool to recommend restaurants based on user preferences
    prompt = f"""
    Based on these user preferences: {preferences}
    Suggest 3 restaurants with details about cuisine, price range, and seating options.
    """
    return llm(prompt).strip()

def make_reservation(user_input, context):
    # Multi-turn reservation tool
    # Extract information or ask for missing details
    required_info = ["name", "date", "time", "party_size"]
    # Logic to extract or request missing information
    # ...
    

Unlike framework-based implementations, our custom tools:

  • Unlike framework-based implementations, our custom tools:

  • Can be synchronous or asynchronous

  • Are tailored exactly to our application's needs

  • Can include complex logic like multi-turn interactions

4. Building the Controller

The controller is the orchestration layer that determines which tools to use based on user input:

class RestaurantAgent:
    def __init__(self, model):
        self.model = model
        self.context = ConversationContext()
    
    def process_input(self, user_input):
        # Get current context
        context = self.context.get_formatted_context()
        
        # Determine intent
        intent = find_intent(user_input, context)
        
        # Select appropriate tool based on intent
        if intent == "RECOMMEND_RESTAURANT":
            response = suggest_restaurant(user_input, context)
        elif intent == "MAKE_RESERVATION":
            response = make_reservation(user_input, context)
        else:
            # General conversation
            response = self.handle_general_inquiry(user_input, context)
        
        # Update context
        self.context.add_exchange(user_input, response)
        return response
    
    def handle_general_inquiry(self, user_input, context):
        prompt = f"""
        You are a helpful restaurant assistant.
        
        Conversation history:
        {context}
        
        User: {user_input}
        Assistant:
        """
        return self.model(prompt).strip()

The controller follows a simple pattern:

  • Analyze the user input to determine intent

  • Select the appropriate tool based on intent

  • Execute the tool and generate a response

  • Update conversation context

5. Creating a Simple UI

The controller is the orchestration layer that determines which tools to use based on user input:

# Simple terminal-based UI example
def main():
    agent = RestaurantAgent(model)
    print("Restaurant Assistant: Hello! How can I help you find the perfect dining experience today?")
    
    while True:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit", "bye"]:
            break
            
        response = agent.process_input(user_input)
        print(f"Restaurant Assistant: {response}")

if __name__ == "__main__":
    main()

For web-based applications, you might implement a simple Flask or FastAPI backend with a JavaScript frontend.

Advantages of Building Agents from Scratch

The Hugging Face Agents Course highlights several benefits of building from scratch rather than relying on frameworks:

1. Complete Understanding: You grasp every component's role and interaction

2. Maximum Flexibility: Customize each component to your exact requirements

3. Lightweight Implementation: Avoid bloated dependencies and overhead

4. Easier Debugging: Trace issues through your own clean, minimalist code

5. Specific Optimizations: Tune performance for your particular use case

6. Custom Integrations: Seamlessly connect with your existing tech stack

7. Learning Opportunity: Gain deeper insights into agent architecture fundamentals

Challenges and Considerations

While building from scratch offers advantages, it comes with challenges:

1. Prompt Engineering: Crafting effective prompts requires experimentation

2. Error Handling: You need robust error handling for all components

3. Scalability: Consider how your agent will scale with increased usage

4. Testing: Build comprehensive tests for each component

Extending Your Agent

Once you have the basic architecture, you can enhance your agent with:

1. Structured Output Parsing: Add JSON or structured text parsing

2. Tool Chaining: Allow tools to call other tools in sequence

3. Long-Term Memory: Implement database storage for persistent memory

4. User Authentication: Add user-specific contexts and preferences

5. Multi-Modal Capabilities: Extend to handle images or other input types

Advanced Agent Patterns

As you evolve your agent architecture, consider these advanced patterns:

ReAct: Reasoning and Acting

The ReAct pattern combines reasoning and acting in an iterative loop:

1. Observe (receive input or tool results)

2. Think (reason about the current state)

3. Act (use a tool or generate a response)

4. Repeat until the task is complete

This pattern helps agents tackle complex problems by breaking them down into manageable steps and incorporating feedback at each stage.

Self-Reflection and Refinement

Advanced agents can benefit from self-reflection capabilities:

def self_reflect(agent_response, user_query, context):
    prompt = f"""
    Review your proposed response:
    {agent_response}
    
    Based on the user query:
    {user_query}
    
    And conversation context:
    {context}
    
    Is this response complete, accurate, and helpful? 
    If not, provide an improved response:
    """
    return llm(prompt)

This allows agents to catch their own mistakes and improve responses before sending them to users.

Planning with Tool Composition

Complex tasks often require multiple tools used in sequence. A planning component can help:

can help:
def create_plan(user_input, available_tools, context):
    prompt = f"""
    Based on the user request: {user_input}
    
    Create a step-by-step plan using these available tools:
    {available_tools}
    
    Return a JSON array of steps, each with:
    - tool_name: The tool to use
    - tool_input: The parameters to pass
    - description: Why this step is needed
    """
    plan = llm(prompt)
    return json.loads(plan)

This allows agents to decompose complex tasks into manageable sequences of tool calls.

Component by Component

The Model

I wrapped the LLaMA model with a basic prompt-response interface. Nothing fancy. But I controlled everything—which model, how much context, how responses were generated.

The Context Manager

I built a class to handle turn-based memory. Ten exchanges max. Formatted nicely for the model. It felt like building a short-term memory module.

Tools

This was fun. I hand-coded:

  • A restaurant recommender

  • A reservation tool with multi-turn logic

  • An intent detector

Each tool took structured input and returned structured output. No YAML, no config files—just Python functions. And I could debug every line.

The Controller

This was the brainstem. It routed inputs to tools based on detected intent. If something failed? I caught the exception. If the user changed their mind? I had control over how that flowed. No middleware magic—just logic.

Testing It Live

I fired up a terminal and ran:

Bash

python main.py

And I had a conversation like this:

Screenshot 2025-04-28 110919.webp

It remembered context. It asked clarifying questions. It made decisions.

And all of it—every step—was mine.

What I Learned

Building an agent from scratch isn’t just doable—it’s enlightening. I understood:

  • How LLMs reason

  • Why memory matters

  • Where planning breaks down

  • How tool orchestration changes everything

It gave me full flexibility to swap models, fine-tune prompts, and debug behavior. And now when I use frameworks, I understand what's happening underneath.

What’s Next?

I plan to add:

  • Long-term memory with a vector DB

  • Multi-modal input (menus, maps)

  • More tools: payment, loyalty, dietary filters

  • A web UI with streaming responses

And eventually… maybe even a multi-agent setup with specialized sub-agents. One for reservations. One for menu questions. One for loyalty programs. Why not?

Final Thoughts

Frameworks are great when you want speed. But if you want mastery? Build from scratch—at least once.

You’ll learn more in a week of raw code than a month of plug-and-play.

If you’re curious, start with something small. A travel planner. A study buddy. Or a restaurant agent, like I did.

The AI future is agentic—and now, I know how to build it from the ground up.

Related Blogs

How Can Enterprises Benefit from Generative AI in Data Visualization

It’s New Year’s Eve, and John, a data analyst, is finishing up a fun party with his friends. Feeling tired and eager to relax, he looks forward to unwinding. But as he checks his phone, a message from his manager pops up: “Is the dashboard ready for tomorrow’s sales meeting?” John’s heart sinks. The meeting is in less than 12 hours, and he’s barely started on the dashboard. Without thinking, he quickly types back, “Yes,” hoping he can pull it together somehow. The problem? He’s exhausted, and the thought of combing through a massive 1000-row CSV file to create graphs in Excel or Tableau feels overwhelming. Just when he starts to panic, he remembers his secret weapon: Fortune Cookie, the AI-assistant that can turn data into insightful data visualizations in no time. Relieved, John knows he doesn’t have to break a sweat. Fortune Cookie has him covered, and the dashboard will be ready in no time.

How Can Enterprises Benefit from Generative AI in Data Visualization

Streamlining File Management with MindFolder’s Intelligent Edge

Brain rot, the 2024 Word of the Year, perfectly encapsulates the overwhelming state of mental fatigue caused by endless information overload—a challenge faced by individuals and businesses alike in today’s fast-paced digital world. At its core, this term highlights the need for streamlined systems that simplify the way we interact with data and files.

Streamlining File Management with MindFolder’s Intelligent Edge

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Data visualization and storytelling are critical for making sense of today’s data-rich world. Whether you’re an analyst, a researcher, or a business leader, translating raw data into actionable insights often hinges on effective tools. Two innovative platforms that elevate this process are Microsoft’s LIDA and our RAG-enhanced data visualization platform using gen AI, AI Fortune Cookie. While LIDA specializes in refining and enhancing infographics, Fortune Cookie transforms disparate datasets into cohesive dashboards with the power of natural language prompts. Together, they form a powerful combination for visual storytelling and data-driven decision-making.

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

As data grows, enterprises face challenges in managing their knowledge systems. While Large Language Models (LLMs) like GPT-4 excel in understanding and generating text, they require substantial computational resources, often needing hundreds of gigabytes of memory and costly GPU hardware. This poses a significant barrier for many organizations, alongside concerns about data privacy and operational costs. As a result, many enterprises find it difficult to utilize the AI capabilities essential for staying competitive, as current LLMs are often technically and financially out of reach.

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

GuideLine: RAG-Enhanced HRMS for Smarter Workflows

Human Resources Management Systems (HRMS) often struggle with efficiently managing and retrieving valuable information from unstructured data, such as policy documents, emails, and PDFs, while ensuring the integration of structured data like employee records. This challenge limits the ability to provide contextually relevant, accurate, and easily accessible information to employees, hindering overall efficiency and knowledge management within organizations.

GuideLine: RAG-Enhanced HRMS for Smarter Workflows
How Can Enterprises Benefit from Generative AI in Data Visualization

How Can Enterprises Benefit from Generative AI in Data Visualization

It’s New Year’s Eve, and John, a data analyst, is finishing up a fun party with his friends. Feeling tired and eager to relax, he looks forward to unwinding. But as he checks his phone, a message from his manager pops up: “Is the dashboard ready for tomorrow’s sales meeting?” John’s heart sinks. The meeting is in less than 12 hours, and he’s barely started on the dashboard. Without thinking, he quickly types back, “Yes,” hoping he can pull it together somehow. The problem? He’s exhausted, and the thought of combing through a massive 1000-row CSV file to create graphs in Excel or Tableau feels overwhelming. Just when he starts to panic, he remembers his secret weapon: Fortune Cookie, the AI-assistant that can turn data into insightful data visualizations in no time. Relieved, John knows he doesn’t have to break a sweat. Fortune Cookie has him covered, and the dashboard will be ready in no time.

Streamlining File Management with MindFolder’s Intelligent Edge

Streamlining File Management with MindFolder’s Intelligent Edge

Brain rot, the 2024 Word of the Year, perfectly encapsulates the overwhelming state of mental fatigue caused by endless information overload—a challenge faced by individuals and businesses alike in today’s fast-paced digital world. At its core, this term highlights the need for streamlined systems that simplify the way we interact with data and files.

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Data visualization and storytelling are critical for making sense of today’s data-rich world. Whether you’re an analyst, a researcher, or a business leader, translating raw data into actionable insights often hinges on effective tools. Two innovative platforms that elevate this process are Microsoft’s LIDA and our RAG-enhanced data visualization platform using gen AI, AI Fortune Cookie. While LIDA specializes in refining and enhancing infographics, Fortune Cookie transforms disparate datasets into cohesive dashboards with the power of natural language prompts. Together, they form a powerful combination for visual storytelling and data-driven decision-making.

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

As data grows, enterprises face challenges in managing their knowledge systems. While Large Language Models (LLMs) like GPT-4 excel in understanding and generating text, they require substantial computational resources, often needing hundreds of gigabytes of memory and costly GPU hardware. This poses a significant barrier for many organizations, alongside concerns about data privacy and operational costs. As a result, many enterprises find it difficult to utilize the AI capabilities essential for staying competitive, as current LLMs are often technically and financially out of reach.

GuideLine: RAG-Enhanced HRMS for Smarter Workflows

GuideLine: RAG-Enhanced HRMS for Smarter Workflows

Human Resources Management Systems (HRMS) often struggle with efficiently managing and retrieving valuable information from unstructured data, such as policy documents, emails, and PDFs, while ensuring the integration of structured data like employee records. This challenge limits the ability to provide contextually relevant, accurate, and easily accessible information to employees, hindering overall efficiency and knowledge management within organizations.

Additional

Your Random Walk Towards AI Begins Now