0
0
Agentic AIml~12 mins

State graphs and transitions in Agentic AI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - State graphs and transitions

This pipeline shows how an agent uses a state graph to decide actions. The agent moves between states based on transitions, learning which paths lead to success.

Data Flow - 5 Stages
1Initial States
1 graph with 5 statesDefine states and possible transitionsGraph with 5 nodes and edges
States: S0, S1, S2, S3, S4; Transitions: S0->S1, S1->S2, S2->S3, S3->S4, S4->S0
2Agent Observes Current State
Current state S0Agent reads current state from graphState vector representing S0
Agent sees it is in state S0
3Transition Decision
State vector S0Agent chooses next state based on learned policyNext state S1
Agent decides to move from S0 to S1
4State Update
Current state S0, next state S1Agent updates its state to S1Agent now in state S1
Agent moves to state S1
5Reward Feedback
Transition S0->S1Agent receives reward signalReward value scalar
Agent gets reward 0.5 for moving to S1
Training Trace - Epoch by Epoch

Loss
1.0 |**********
0.8 |********
0.6 |******
0.4 |****
0.2 |**
0.0 |*
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.90.2Agent starts with random transitions, low accuracy
20.70.4Agent learns better transitions, accuracy improves
30.50.6Agent refines policy, loss decreases steadily
40.30.8Agent approaches optimal transitions
50.150.95Agent achieves high accuracy, low loss
Prediction Trace - 5 Layers
Layer 1: Input State Encoding
Layer 2: Policy Network
Layer 3: Action Selection
Layer 4: State Transition
Layer 5: Reward Reception
Model Quiz - 3 Questions
Test your understanding
What does the agent use to decide the next state?
ARandom choice without any input
BProbabilities from the policy network
CAlways the next state in order
DThe state with the lowest reward
Key Insight
State graphs help agents learn which transitions lead to better outcomes. Encoding states as vectors and using a policy network allows the agent to predict and choose the best next state, improving over training by reducing loss and increasing accuracy.