Real-world performance metrics based on LOCOMO dataset
Configuration: LLM: gpt-4o, OceanBase: 4.3.5.4
| Category | Description | LLM Score |
|---|---|---|
1Multi-Hop | Questions that require synthesizing information from multiple sessions. | 71.63 |
2Temporal Reasoning | Questions can be answered through temporal reasoning and capturing time-related data cues within the conversation. | 79.13 |
3Open-Domain | Questions can be answered by integrating a speaker's provided information with external knowledge, such as commonsense or world facts. | 55.21 |
4Single-Hop | Questions asking for specific facts directly mentioned in the single session conversation. | 83.59 |