Skip to content

Profile serialization crashes when LLM returns structured bio/persona fields #154

@ygh1254

Description

@ygh1254

Summary

When profile generation returns structured JSON objects for fields like bio, persona, or country, MiroFish can fail during profile serialization before config generation starts.

Reproduction context

Observed on a live run with:

  • simulation_id: sim_e69a946b6158
  • graph_id: mirofish_a39b5f10127f4744
  • entities_count: 91
  • status in state file: failed
  • error in state file: slice(None, 150, None)

Traceback

File "backend/app/services/simulation_manager.py", line 361, in prepare_simulation
  generator.save_profiles(
File "backend/app/services/oasis_profile_generator.py", line 1189, in save_profiles
  self._save_reddit_json(profiles, file_path)
File "backend/app/services/oasis_profile_generator.py", line 1296, in _save_reddit_json
  "bio": profile.bio[:150] if profile.bio else f"{profile.name}",
KeyError: slice(None, 150, None)

Root cause

OasisAgentProfile declares bio / persona as strings, but the profile generation path does not enforce that contract.

Current flow:

  • LLM output is parsed with json.loads(...)
  • only presence/truthiness of result["bio"] and result["persona"] is checked
  • values are passed through directly into OasisAgentProfile
  • serializer later assumes they are strings ([:150], .replace(...), concatenation)

In the failed run, partially written reddit_profiles.json already contained mixed types:

  • bio as dict in at least 2 profiles
  • persona as dict in at least 31 profiles
  • country as list in at least 12 profiles

So the crash is not strictly a prompt-length issue; it is a type-normalization bug between profile generation and serialization.

Impacted code

  • backend/app/services/oasis_profile_generator.py
    • generate_profile_from_entity(...)
    • _generate_profile_with_llm(...)
    • _save_twitter_csv(...)
    • _save_reddit_json(...)

Suggested fix

Normalize mixed LLM outputs before building OasisAgentProfile, and add defensive coercion again before serialization.

Examples:

  • coerce dict/list bio/persona to strings
  • join list country values into a single string
  • flatten structured interested_topics

Why this matters

Even when the simulation graph and profile generation complete, the run can still fail during save/serialization, leaving the simulation in a failed state before config generation / run starts.

Verified local fix

A minimal patch that:

  • normalizes bio/persona/country/profession/interested_topics on OasisAgentProfile construction
  • re-coerces values in _save_twitter_csv() and _save_reddit_json()

was able to serialize a reproduction object with:

  • bio as dict
  • persona as dict
  • country as list

without crashing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions