Building Synthetic Datasets with Reasoning Traces Using Gretel Navigator

Incorporating reasoning traces into synthetic datasets enhances AI transparency and trustworthiness.
Gretel Copyright 2025
Gretel Copyright 2025

Large Language Models (LLMs) often function as opaque “black boxes,” making it challenging to understand the rationale behind their responses or decisions. Recent advancements in chain-of-thought reasoning and reinforcement learning aim to address this opacity. Models like DeepSeek-R1 employ techniques such as Group Relative Policy Optimization (GRPO) to enhance both accuracy and interpretability. By rewarding adherence to format and accuracy, these techniques encourage the model to develop emergent human-like reasoning traces.  

This approach enables models to “think aloud,” detailing each step in their decision-making process alongside their final outputs. These verbalized reasoning steps can provide useful insights and starting points for evaluating, understanding, and refining AI outputs. Synthetic datasets with embedded reasoning traces play a crucial role in this development, especially when combined with advanced reasoning models and fine-tuning approaches like those used in DeepSeek-R1.

Why Creating Synthetic Datasets with Embedded Reasoning Matters

Synthetic data with embedded reasoning traces offers a powerful way to train and improve AI. Here’s how this innovative approach makes AI training more effective and accessible:

  • Enhanced Transparency and Trust: Each synthetic example includes a comprehensive, step-by-step explanation of how the answer was generated, making the process easier to understand. This clarity allows for quick detection and correction of errors, leading to more reliable AI systems.
  • Systematic Generalization with Human-Like Logic: Models can learn the reasoning behind each question-answer pair, enabling them to handle new challenges by understanding the “why” behind each response. One approach that can be leveraged is “spoonfeeding” using synthetic reasoning traces—as demonstrated in the Phi‑4 work—which helps to provide step-by-step guidance to build robust reasoning skills by allowing the model to arrive at the final answer step by step
  • Alignment with Advanced Reinforcement Learning Techniques: DeepSeek-R1’s GRPO rewards any rollout that produces the correct final answer, without directly optimizing for the quality of the chain-of-thought. In practice, however, the released model benefits mainly from a supervised fine-tuning warm start on clear, high-quality synthetic reasoning traces. That said, the potential exists to incorporate synthetic reasoning traces into the reward process—using them to signal whether the model has correctly picked up on key user cues—which could further steer models toward more interpretable decision-making. 
  • Cold-Start Training for Reinforcement Learning: Even a modest collection of high-quality problem-solution pairs can effectively fill the “empty blackboard” that models face at the start of training. Although the volume may be small compared to typical pre-training datasets, focusing on the quality of these examples is sufficient to jump-start learning without relying on costly human-provided examples.

Example Use Case: Incorporating Empathetic Reasoning

AI isn’t just about solving math problems or processing data. It increasingly interacts with people on personal or emotional topics—think customer service, mental health support, or everyday social interactions. Synthetic datasets enriched with empathetic reasoning provide exactly that: they include scenarios that prompt AI to respond with both accuracy and sensitivity.

Let’s take the following example:

User Statement: “I just lost my credit card and I feel completely helpless. I don’t know what to do next.”

Without Empathic Reasoning: AI Answer: “Your card is lost. I will block it and arrange for a new one.”‍

With Empathic Reasoning:

  • Reasoning Trace: “The user has expressed a strong emotional response by stating they feel completely helpless after losing their card. The use of ‘helpless’ indicates significant distress and vulnerability. While the technical solution is simple—blocking the card and issuing a new one—the emotional language suggests that reassurance is equally critical. An empathetic response addresses both the technical need and the user’s emotional state, building trust and ensuring a more positive experience.”‍
  • AI Answer: “I understand this situation is very distressing. I’m here to help. Let’s block your card immediately and get a replacement issued right away.”

This detailed reasoning trace not only reveals how the AI reaches its decisions but also ensures that its responses are both logically robust and emotionally sensitive.

How Gretel Navigator Can Help

Tools like Gretel Navigator make it easier than ever to generate high-quality synthetic datasets with embedded reasoning traces. By leveraging Gretel Navigator, users can:

  • Create large, diverse datasets that include both problem statements and detailed reasoning.
  • Generate data at scale, enabling rapid prototyping and testing.
  • Contribute to the development of more transparent AI reasoning models.

Below is a step-by-step walkthrough for generating a synthetic dataset featuring everyday scenarios enriched with detailed reasoning traces. You can also follow along in this notebook.

1. Installation and Setup

First, install the Gretel Client SDK.

%pip install -Uqq gretel_client

2. Configuring Special System Instructions

Define system-wide instructions to guide Gretel Navigator’s models in producing reasoning traces with an empathetic tone:

special_system_instructions = """
You are an empathic reasoning agent. Your task is to generate realistic and compassionate reasoning traces for common day-to-day situations. Adopt a caring and supportive tone as you provide detailed insights into human experiences and emotions.
- Focus on everyday scenarios where empathy, understanding, and emotional intelligence are key.
- Consider various perspectives, emphasizing the emotional impact of actions and decisions.
- Ensure your reasoning process is clear, structured, and heartfelt, reflecting deep care for the individuals involved.
- Enclose your thoughtful reasoning process within <think>...</think> tags before providing the final JSON output.
"""

3. Initializing the Data Designer

Create an instance of the Data Designer using your Gretel API key, chosen model suite, and the special system instructions defined above. To learn more about model suites, check out our docs.

data_designer = DataDesigner(
    api_key="prompt",
    model_suite="llama-3.x",
    special_system_instructions=special_system_instructions
)

4. Establishing Context and Diversity with Categorical Seed Columns

Define categorical seed columns to set both the context and diversity for your synthetic reasoning traces. For example, create a “domain” column with various subcategories to specify the type of scenario, and a “complexity” column to indicate the level of reasoning required. Together, these seeds ensure that the generated data fits specific contexts while covering a wide range of scenarios and difficulty levels.

data_designer.add_categorical_seed_column(
    name="domain",
    description="The domain of the empathic reasoning scenario, reflecting everyday human experiences and emotional challenges.",
    values=[
        "Family Dynamics",
        "Workplace Challenges",
        "Friendship Moments",
        "Community Interactions",
        "Personal Well-being",
        "Unexpected Encounters"
    ],
    subcategories=[
        {
            "name": "theme",
            "values": {
                "Family Dynamics": [
                    "Parenting Dilemmas",
                    "Sibling Rivalries"
                ],
                "Workplace Challenges": [
                    "Communication Breakdowns",
                    "Leadership Dilemmas"
                ],
                "Friendship Moments": [
                    "Support & Understanding",
                    "Misunderstandings & Reconciliations"
                ],
		   ...
            },
            "num_new_values_to_generate": 2
        }
    ],
    num_new_values_to_generate=5
)

data_designer.add_categorical_seed_column(
    name="complexity",
    description="The complexity level of the empathic reasoning scenario",
    values=["Basic", "Intermediate", "Advanced"],
)

5. Defining Generated Data Columns

First, we generate scenario statements that include a clear problem or challenge along with a follow-up question. Then, we define prompts for generating the reasoning trace and evaluating it.

Scenario Generation

We begin by adding a generated data column named “scenario.” This column will contain everyday situations where empathy and understanding are crucial. The generation prompt guides the model to focus on scenarios that highlight emotional challenges or opportunities for compassionate support.

data_designer.add_generated_data_column(
    name="scenario",
    generation_prompt=(
        "Generate a clear and concise everyday scenario for the {domain} domain, theme {theme}, and complexity {complexity}, "
        "where empathy and understanding play a crucial role. Focus on a situation that highlights emotional challenges or opportunities for compassionate support, and include a specific question or request for help that clearly outlines a problem or challenge needing resolution.\n\n"
        "..."
    ),
    columns_to_list_in_prompt="all_categorical_seed_columns",
    data_config={"type": "text"}
)

‍Empathic Reasoning Trace Generation

Next, we define classes to structure the output of our reasoning trace. These classes will help in organizing the reasoning process into clear, understandable steps.

class Thought(BaseModel):
    """A single step in the structured empathic reasoning process."""
    step_number: int = Field(..., ge=1, description="The order of the reasoning step, starting from 1.")
    content: str = Field(..., min_length=5, description="A detailed explanation of this reasoning step, incorporating both logical analysis and emotional insight.")

class ReasoningTrace(BaseModel):
    """A structured empathic reasoning trace for addressing a scenario."""
    reasoning: List[Thought] = Field(..., description="Step-by-step reasoning leading to the final answer, enriched with empathetic observations and practical insights.")
    answer: str = Field(..., description="The final answer derived from the empathic reasoning process, offering compassionate guidance or resolution.")

class Evaluation(BaseModel):
    """Output format for evaluating an empathic reasoning answer.
    The evaluation assesses the response based on correctness, clarity, and completeness,
    with feedback that emphasizes compassionate insight, clarity, and a holistic understanding of the scenario.
    """
    correctness: float = Field(..., description="Overall correctness rating of the answer (0 to 1).")
    clarity: float = Field(..., description="Clarity rating of the reasoning, including the integration of empathic explanations (0 to 1).")
    completeness: float = Field(..., description="Completeness rating of the reasoning, assessing whether all practical and emotional aspects were considered (0 to 1).")
    feedback: str = Field(..., description="Detailed feedback on the reasoning trace and answer, with suggestions for enhancing empathetic and real-world applicability.")

class FinalEvaluation(Evaluation):
    """Extended evaluation model for final empathic reasoning traces.
    This model adds criteria to assess visual structure and conciseness,
    ensuring the final output is both clear and visually appealing.
    """
    structure: float = Field(...,  description="Rating of the visual structure and formatting (0 to 1), assessing if reasoning steps and final answer are clearly delineated.")
    conciseness: float = Field(..., description="Rating of the conciseness of the reasoning trace (0 to 1), ensuring that extraneous verbosity is minimized.")

We then set up the generated columns for the initial and final reasoning traces.

# Initial Reasoning Trace Generation
data_designer.add_generated_data_column(
    name="initial_trace",
    generation_prompt=(
        "You are an empathic reasoning agent. Provide a detailed, step-by-step reasoning process that thoughtfully addresses the following scenario. "
        "Begin by outlining your internal thought process, focusing on both logical considerations and emotional insights, enclosed within <think>...</think> tags. "
        "Then, provide your final compassionate answer.\n\n"
        "Scenario: {scenario}\n\n"
        "Ensure that your response is structured and reflective of a supportive, empathetic approach."
    ),
    data_config={"type": "structured", "params": {"model": ReasoningTrace}}
)

# Initial Trace Evaluation
data_designer.add_generated_data_column(
    name="initial_trace_evaluation",
    generation_prompt=(
        "<initial_trace>{initial_trace}</initial_trace>\n\n"
        "Now, analyze the provided empathic reasoning trace and final answer as if you were an insightful observer assessing both logical and compassionate approaches. "
        "Evaluate the response with a focus on emotional insight, clarity, and holistic consideration.\n\n"
        "Include your internal thought process within <think>...</think> tags before providing the JSON."
    ),
    data_config={"type": "structured", "params": {"model": Evaluation}}
)

Final Reasoning Trace Generation and Evaluation:

After generating an initial reasoning trace, the model’s output is evaluated, and then the reasoning trace is refined in an iterative process. With each iteration, the model inspects its reasoning trace—integrating feedback and correcting any inefficiencies—so that the final trace becomes more efficient, concise, and aligned with the desired reasoning quality. This iterative refinement not only boosts the overall logical coherence of the trace but also makes it easier to interpret and use as a basis for further model improvements.

# Final Reasoning Trace Generation
data_designer.add_generated_data_column(
    name="final_trace",
    generation_prompt=(
        "Review the scenario, your initial empathic reasoning trace, and its evaluation:\n\n"
        "Scenario: {scenario}\n\n"
        "Initial Empathic Reasoning Trace:\n{initial_trace}\n\n"
        "Initial Trace Evaluation:\n{initial_trace_evaluation}\n\n"
        "From the perspective of an empathic reasoning agent, provide a detailed final reasoning trace that addresses both the emotional and logical dimensions of the scenario. "
        "Include your internal thought process wrapped within <think>...</think> tags. "
        "Ensure that your refined trace incorporates improvements suggested by the evaluation and clearly explains how you arrived at the final compassionate answer.\n\n"
        "Return only the final reasoning trace."
    ),
    data_config={"type": "structured", "params": {"model": ReasoningTrace}}
)

data_designer.add_generated_data_column(
    name="final_trace_evaluation",
    generation_prompt=(
        "<final_trace>{final_trace}</final_trace>\n\n"
        "Analyze the provided empathic reasoning trace and final answer from the viewpoint of an insightful observer. "
        "Evaluate the response focusing on correctness, clarity, and completeness, as well as its visual structure and conciseness. "
        "Assess whether the reasoning steps are clearly separated (e.g., numbered or bullet-pointed) and if the final answer is distinct and succinct.\n\n"
        "Include your internal thought process within <think>...</think> tags before providing the JSON."
    ),
    llm_type="judge",
    data_config={"type": "structured", "params": {"model": FinalEvaluation}}
)

‍

6. Generating a Dataset Preview

Finally, run a preview generation to create a small sample of records. This allows you to verify the configuration and see sample outputs before scaling up.

# Generate a preview with 10 records for inspection
preview = data_designer.generate_dataset_preview(num_records=10)
print(preview.dataset)
preview.display_sample_record()

‍

Figure 1. Example output record.

By following these steps, you can utilize Gretel Navigator to build a synthetic dataset enriched with detailed, empathetic reasoning traces.

Next Steps: Optimizing AI with Synthetic Reasoning

High-quality synthetic reasoning traces from Gretel Navigator open up new opportunities for enhancing AI models, including: 

  • The development of evaluation benchmarks for the model's reasoning capability – ensuring they not only produce correct answers, but also follow a clear, transparent reasoning process. 
  • It also supports customized model improvement through various training strategies, including supervised fine-tuning, direct preference optimization, or reinforcement learning. 
  • Apply these advances in real-world scenarios—from customer support to mental health services—while addressing ethical considerations.
  • Foster collaboration among practitioners and researchers to further refine AI reasoning techniques.

Conclusion

As AI integrates further into everyday life, understanding why it reaches specific conclusions is vital for trust, reliability, and meaningful human-AI interactions. Incorporating reasoning traces into synthetic datasets enhances AI transparency and trustworthiness. Utilizing tools like Gretel Navigator facilitates the development of AI systems that are both precise and empathetic.

Join our community of synthetic data builders! Share your experiences, ask questions, or get help with your specific use case on the Synthetic Data Discord Community or reach out directly at sales@gretel.ai.