
June 22, 2025
Building AI Workflows with LangGraph and Flask
The AI world is moving fast, but building real-world apps still takes more than just calling an LLM. You need structure, memory, and control. That’s where LangGraph comes in. It lets you design intelligent workflows using nodes and edges, giving you more power than simple prompt chaining or sequential scripts.
We’ll show you how to build a Flask API powered by LangGraph and OpenAI’s GPT-4o. You’ll walk away with a working /chat endpoint that processes user messages, stores conversation history in memory, and returns smart responses all while keeping the codebase modular and production friendly.
Here’s what we’ll build:
- A two-node LangGraph workflow defined in langgraph_workflow.py
- An in-memory store (memory_store.py) keyed by user_id to track conversation history
- A Flask API (app.py) that exposes a /chat POST endpoint with JSON input/output
- Full error handling for malformed input and OpenAI API issues
- Clean project setup with requirements.txt and a documented README.md
Let’s dive in and start connecting your LLM logic to the web, the right way.
2. Why LangGraph + Flask?
This tutorial brings together two powerful tools: LangGraph for orchestrating complex LLM logic and Flask for serving it as a web API. Here’s why this combination makes perfect sense:
Quick Intro to LangGraph’s Node-Based Design
LangGraph extends the LangChain ecosystem, bringing statefulness and cycle handling to your LLM applications. Unlike simple sequential chains or prompt templates, LangGraph allows you to define workflows as directed graphs, where:
- Nodes are individual steps or components (e.g., calling an LLM, performing a database lookup, using a tool, processing user input).
- Edges define the flow between these nodes, dictating the order of execution.
- State Management is central. LangGraph manages a persistent state object that is passed between nodes, allowing information to be accumulated and modified throughout the workflow. This is crucial for multi-turn conversations, agentic behavior, and tool use.
- Conditional Edges & Cycles enable sophisticated logic, such as routing to different tools based on LLM output, or re-prompting the user for clarification—something difficult to achieve with linear chains.
For our simple chat application, LangGraph provides the foundation for adding memory and easily extending the flow later, perhaps by adding tools or more complex decision-making nodes.
Why Flask Makes Sense for Exposing LangGraph Apps
Flask is a micro-framework for Python, known for its simplicity and flexibility. It’s an excellent choice for exposing your LangGraph workflows as a web API for several reasons:
- Lightweight and Minimalist: Flask doesn’t impose heavy conventions, letting you design your API exactly as needed. For a single-purpose endpoint like /chat, it’s incredibly efficient.
- Easy to Get Started: With just a few lines of code, you can define routes, handle requests, and return JSON responses. This makes rapid prototyping and development a breeze.
- Pythonic: Being a Python framework, Flask integrates seamlessly with other Python libraries, including LangGraph and OpenAI’s SDK, without any impedance mismatch.
- Scalability for Microservices: While simple, Flask applications can be scaled effectively when deployed behind a WSGI server (like Gunicorn) and a reverse proxy (like Nginx), making them suitable for microservice architectures where specific functionalities are exposed via dedicated APIs.
By combining LangGraph’s intelligent orchestration with Flask’s simple API exposure, we get a powerful, modular, and easy-to-understand foundation for our AI application.
3. Project Setup
Let’s set up our project structure. Create a new directory, e.g., langgraph_flask_chat, and place the following files inside.
requirements.txt
This file lists all the Python libraries our project depends on.
# requirements.txtflask>=3.0.0,<4
openai>=1.6.0
langgraph>=0.0.10
python-dotenv>=1.0.0 # optional – for local .env loading
memory_store.py
This module provides a very basic, in-memory store for our conversation history. In a real-world application, you would replace this with a persistent database like Redis, PostgreSQL, or a NoSQL database. For this tutorial, it perfectly illustrates the concept of maintaining per-user state.
# memory_store.py
"""
Simple in-memory conversation history.
Keeps a list of messages (dicts with role & content) per user_id.
"""
# Global store: { user_id: [ { role: "user"/"assistant", content: str }, ... ] }
_history_store = {}
def get_memory(user_id: str):
"""
Retrieve the conversation history for a user.
Returns a list of message dicts; empty list if none.
"""
return _history_store.get(user_id, []).copy()
def update_memory(user_id: str, role: str, content: str):
"""
Append a message to the user's history.
role: "user" or "assistant"
content: the message text
"""
if user_id not in _history_store:
_history_store[user_id] = []
_history_store[user_id].append({"role": role, "content": content})
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Python
IGNORE_WHEN_COPYING_END
langgraph_workflow.py
This is where the core logic of our LangGraph workflow resides. We define two nodes: one for processing the prompt and calling the LLM, and another for updating the conversation memory.
# langgraph_workflow.py
"""
Defines a simple LangGraph workflow with two nodes:
1. prompt_processor: constructs prompt + calls GPT-4o via OpenAI SDK.
2. memory_updater: merges new messages into in-memory state.
"""
import openai
from langgraph import Graph
# Instantiate the workflow graph
graph = Graph()
def prompt_processor(inputs: dict) -> dict:
"""
Node #1: Builds the chat prompt using past memory and the new user message.
Calls OpenAI GPT-4o and returns the assistant reply.
Inputs:
- inputs["memory"]: list of previous messages (role/content dicts)
- inputs["message"]: current user message (string)
Outputs:
- "assistant_reply": the generated text from GPT-4o
"""
# Merge memory + new user message into a single list of messages
chat_messages = inputs["memory"] + [
{"role": "user", "content": inputs["message"]}
]
# Call OpenAI ChatCompletion for GPT-4o
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=chat_messages
)
assistant_reply = response.choices[0].message["content"]
# LangGraph carries state forward; this adds "assistant_reply" to the overall state
return {"assistant_reply": assistant_reply}
def memory_updater(inputs: dict) -> dict:
"""
Node #2: Updates the in-memory history with the user message
and the assistant's reply.
Inputs:
- inputs["memory"]: current message history
- inputs["message"]: the user message
- inputs["assistant_reply"]: the reply from GPT-4o
Outputs:
- "memory": updated history list (this overwrites the 'memory' key in the state)
"""
updated = inputs["memory"].copy()
updated.append({"role": "user", "content": inputs["message"]})
updated.append({"role": "assistant", "content": inputs["assistant_reply"]})
return {"memory": updated}
# Register nodes/tasks in the graph
graph.add_task("prompt_processor", prompt_processor)
graph.add_task("memory_updater", memory_updater)
# Define execution order: prompt_processor → memory_updater
graph.link("prompt_processor", "memory_updater")
# Set the entry point of the graph (where execution starts)
graph.set_entry_point("prompt_processor")
# Compile the graph
graph = graph.compile()
Self-correction: Added graph.set_entry_point(“prompt_processor”) and graph = graph.compile(). These are crucial for LangGraph to work properly. set_entry_point tells LangGraph where to start, and compile() builds the runnable graph.
app.py
This is our Flask application, which serves as the entry point for API requests. It orchestrates the retrieval of history, execution of the LangGraph workflow, and persistence of the updated history.
# app.py
"""
Flask application exposing a /chat endpoint.
Integrates the LangGraph workflow and in-memory history.
"""
import os
from flask import Flask, request, jsonify
import openai
from dotenv import load_dotenv # Import load_dotenv
from memory_store import get_memory, update_memory
from langgraph_workflow import graph
# Load environment variables from .env file
load_dotenv()
# Initialize OpenAI API key from environment
openai.api_key = os.getenv("OPENAI_API_KEY", "")
app = Flask(__name__)
@app.route("/chat", methods=["POST"])
def chat():
"""
POST /chat
Expects JSON: { "user_id": str, "message": str }
Returns JSON: { "reply": str } or { "error": str }
"""
if not openai.api_key:
return jsonify({"error": "OpenAI API key not set."}), 500
try:
data = request.get_json()
if not data or "user_id" not in data or "message" not in data:
return jsonify({
"error": "Invalid payload: must include 'user_id' and 'message'."
}), 400
user_id = data["user_id"]
user_message = data["message"]
# Retrieve past conversation history
memory = get_memory(user_id)
# Run LangGraph workflow
# The graph.run() method takes an initial state and returns the final state
result = graph.invoke({
"memory": memory,
"message": user_message
})
# Extract assistant_reply and the updated memory from the final state
assistant_reply = result.get("assistant_reply")
# updated_memory = result.get("memory") # This would be the updated memory from the graph
if not assistant_reply:
return jsonify({"error": "LangGraph workflow did not return an assistant reply."}), 500
# Persist updated memory in our in-memory store
# We manually update memory_store here, using the output from the graph.
# This is because the graph's internal 'memory' state is passed around,
# but our external `_history_store` needs to be explicitly updated.
update_memory(user_id, "user", user_message)
update_memory(user_id, "assistant", assistant_reply)
return jsonify({"reply": assistant_reply}), 200
except openai.APIError as oe: # More specific OpenAI APIError
# Catch OpenAI-specific errors (e.g., rate limits, invalid keys)
return jsonify({"error": f"OpenAI API error: {oe.status_code} - {oe.response}"}), 502
except Exception as e:
# Generic server error
return jsonify({"error": f"Server error: {e}"}), 500
if __name__ == "__main__":
# Run Flask in debug mode on port 5000
app.run(host="0.0.0.0", port=5000, debug=True)
Self-correction: Changed openai.error.OpenAIError to openai.APIError for newer OpenAI SDK versions. Also graph.run to graph.invoke which is the current preferred method for running a graph once with an initial state. Added load_dotenv() for .env support. Added a check for openai.api_key. The comment about updated_memory = result.get(“memory”) is correct, but the current memory_store.py implementation relies on explicit update_memory calls, so we’ll stick to that for consistency with the provided memory_store.py.
4. Getting Started
Follow these steps to get your Flask API running:
- Copy code by step:
Create a directory (e.g., langgraph_flask_chat) and save app.py, memory_store.py, langgraph_workflow.py, and requirements.txt inside it. - Set up your Python environment:
It’s recommended to use a virtual environment:python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. BashIGNORE_WHEN_COPYING_END - Install dependencies:
pip install -r requirements.txt
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. BashIGNORE_WHEN_COPYING_END - Set your OpenAI API Key:
Get your API key from the OpenAI Platform. Create a file named .env in the same directory as app.py and add your key:OPENAI_API_KEY="your_openai_api_key_here"
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. IGNORE_WHEN_COPYING_ENDRemember to replace “your_openai_api_key_here” with your actual key. - Run the Flask application:
python app.py
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. BashIGNORE_WHEN_COPYING_ENDYou should see output indicating that Flask is running, typically on http://127.0.0.1:5000.
5. Testing the API
Now that your Flask server is running, let’s test the /chat endpoint using curl or a tool like Postman/Insomnia.
The endpoint expects a POST request with a JSON body containing user_id and message.
First Interaction (New User):
Let’s start a conversation for a new user_id, say test_user_001.
curl -X POST \
-H "Content-Type: application/json" \
-d '{ "user_id": "test_user_001", "message": "Hi, what's your name?" }' \
http://127.0.0.1:5000/chat
You should receive a JSON response like:
{"reply": "I am an AI assistant, I don't have a name."}
Or a similar greeting from GPT-4o.
Second Interaction (Continuing Conversation):
Now, let’s ask a follow-up question. The user_id is key here it tells our memory_store to retrieve the previous conversation history.
curl -X POST \
-H "Content-Type: application/json" \
-d '{ "user_id": "test_user_001", "message": "Can you tell me a joke?" }' \
http://127.0.0.1:5000/chat
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Bash
IGNORE_WHEN_COPYING_END
The response might be:
{"reply": "Why don't scientists trust atoms?\nBecause they make up everything!"}
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Json
IGNORE_WHEN_COPYING_END
Notice how the LLM was able to provide a joke in the context of an ongoing conversation, even though “joke” was the only instruction in the second prompt. This demonstrates the memory being passed correctly through the LangGraph workflow.
Third Interaction (New User, Separate Conversation):
Let’s simulate a different user to confirm conversations are isolated:
curl -X POST \
-H "Content-Type: application/json" \
-d '{ "user_id": "another_user_002", "message": "What is the capital of France?" }' \
http://127.0.0.1:5000/chat
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Bash
IGNORE_WHEN_COPYING_END
Response:
{"reply": "The capital of France is Paris."}
IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Json
IGNORE_WHEN_COPYING_END
This conversation for another_user_002 is completely separate from test_user_001’s conversation, thanks to our memory_store and user_id keying.
6. Next Steps & Improvements
This tutorial provides a solid foundation, but real-world applications often require more. Here are some ideas for extending this project:
- Persistent Memory: Replace the simple _history_store in memory_store.py with a robust, persistent database solution. Options include:
- Redis: Excellent for caching and session management.
- PostgreSQL/MySQL: Relational databases for structured history.
- MongoDB/CosmosDB: NoSQL databases, flexible for chat history.
- LangChain offers ChatMessageHistory and various integrations for this.
- More Complex LangGraph Workflows:
- Tool Usage: Add nodes that call external APIs (e.g., weather API, calculator, search engine) based on LLM’s decision.
- Conditional Routing: Implement branching logic in LangGraph to direct the conversation flow based on user intent or LLM output.
- Human-in-the-Loop: Introduce nodes where a human agent can intervene if the AI needs help.
- Asynchronous Processing: For high-concurrency applications, consider making your Flask app asynchronous using Flask-Async or switching to an async framework like FastAPI. LangGraph itself can run asynchronously using graph.ainvoke().
- Authentication & Authorization: Protect your /chat endpoint with API keys, OAuth, or JWTs.
- Logging & Monitoring: Implement proper logging for requests, responses, and errors. Add metrics for performance monitoring.
- Deployment: Containerize your application using Docker and deploy it to cloud platforms like Heroku, AWS Elastic Beanstalk, Google Cloud Run, or Azure App Service.
- Input/Output Validation (Pydantic): Use libraries like Pydantic for more robust request payload validation and response serialization.
- Streaming Responses: Instead of waiting for the full response, stream tokens back to the client as they are generated by the LLM for a better user experience.
7. Conclusion
You’ve successfully built a Flask API that leverages LangGraph to create an intelligent, stateful chatbot endpoint. We’ve seen how LangGraph’s node based design provides a flexible structure for LLM workflows, while Flask offers a straightforward way to expose these workflows to the web.
This modular approach ensures that your LLM logic is decoupled from your API serving layer, making your application easier to develop, test, and scale. The principles learned here are directly applicable to building more sophisticated AI agents and interactive applications.
Happy coding, and go build something amazing!