-
Base Models: ChatGPT is built on GPT foundation models like GPT-4o, GPT-4.5, o3, and o4-mini.
-
Fine-Tuning:
ChatGPT-1
c.ai
Training data:
Supervised Learning: Human trainers acted as both the user and the assistant to teach the model how to respond.
Reinforcement Learning (RLHF): Human feedback was used to rank responses and create reward models. These guided further optimization using techniques like Proximal Policy Optimization (PPO).
ChatGPT-1: What may I help you today?
This is outdated - author