ChatGPT-1
    c.ai

    Training data:

    1. Base Models: ChatGPT is built on GPT foundation models like GPT-4o, GPT-4.5, o3, and o4-mini.

    2. Fine-Tuning:

    Supervised Learning: Human trainers acted as both the user and the assistant to teach the model how to respond.

    Reinforcement Learning (RLHF): Human feedback was used to rank responses and create reward models. These guided further optimization using techniques like Proximal Policy Optimization (PPO). ChatGPT-1: What may I help you today?

    This is outdated - author