Skip to main content

Performance Benchmarks

Real-world performance metrics based on LOCOMO dataset

Configuration: LLM: gpt-4o, OceanBase: 4.3.5.4

Overall Mean Scores

LLM Score
78.70

Mean Scores Per Category

CategoryDescriptionLLM Score
1Multi-Hop
Questions that require synthesizing information from multiple sessions.71.63
2Temporal Reasoning
Questions can be answered through temporal reasoning and capturing time-related data cues within the conversation.79.13
3Open-Domain
Questions can be answered by integrating a speaker's provided information with external knowledge, such as commonsense or world facts.55.21
4Single-Hop
Questions asking for specific facts directly mentioned in the single session conversation.83.59