ConvLab

ConvLab

DSTC8 Track 1: End-to-End Multi-Domain Dialog Challenge

ConvLab

ConvLab is an open-source multidomain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments.

DSTC8 Track 1: End-to-End Multi-Domain Dialog Challenge

As part of DSTC8, Microsoft Research and Tsinghua University are hosting a track intended to foster progress in building complex bots span over multiple sub-domains to accomplish a complex user goal. To advance state-of-the-art technologies for handling complex dialogs, we offer a timely task focusing on multi-domain end-to-end task completion dialog in travel planning settings.

Evaluation
  • We report each team's best submission result based on success rate for both automatic evaluation and human evaluation.
  • Team Submission ID is the ID CodaLab provided when you made the submission.
  • Human evaluation leaderboard is considered as the final ranking.
  • Human Evaluation Leaderboard

    Rank Team Submission ID Spec # Success Rate Language Understanding Score Response Appropriateness Score Turns
    1 504430 submission4 68.32% 4.149 4.287 19.507
    2 504429 submission1 65.81% 3.538 3.632 15.481
    3 504563 submission2 65.09% 3.538 3.84 13.884
    4 504651 submission1 64.10% 3.547 3.829 16.906
    5 504641 submission2 62.91% 3.742 3.815 14.968
    6 504569 submission4 54.90% 3.784 3.824 14.107
    7 504529 submission1 43.56% 3.554 3.446 21.818
    8 504582 submission2 36.45% 2.944 3.103 21.128
    9 504666 submission1 25.77% 2.072 2.258 16.8
    10 504502 submission2 23.30% 2.612 2.65 15.333
    11 504524 submission1 18.81% 1.99 2.059 16.105
    N/A Baseline milu_rule_rule_template 56.45% 3.097 3.556 17.543

    Automatic Evaluation Leaderboard

    Rank Team Submission ID Spec # Success Rate Return Turns Precision Recall F1 Book Rate
    1 504429 submission1 88.80% 61.56 7 0.92 0.96 0.93 93.75%
    2 504563 submission4 88.60% 61.63 6.69 0.83 0.94 0.87 96.39%
    3 504651 submission1 82.20% 54.09 6.55 0.71 0.92 0.78 94.56%
    4 504641 submission4 80.60% 51.51 7.21 0.78 0.89 0.81 86.45%
    5 504430 submission1 79.40% 49.69 7.59 0.8 0.89 0.83 87.02%
    6 504529 submission1 58.00% 23.7 7.9 0.61 0.73 0.64 75.71%
    7 504666 submission1 56.60% 20.14 9.78 0.68 0.77 0.7 58.63%
    8 504502 submission1 55.20% 17.18 11.06 0.73 0.74 0.71 71.87%
    9 504524 submission1 54.00% 17.15 9.65 0.66 0.76 0.69 72.42%
    10 504569 submission4 52.20% 15.81 8.83 0.46 0.75 0.54 76.38%
    11 504582 submission2 34.80% -6.39 10.15 0.65 0.75 0.68 N/A
    12 504632 submission1 0% -58.88 20.88 0 0.01 0 N/A
    N/A Baseline milu_rule_rule_template 63.40% 30.41 7.67 0.72 0.83 0.75 86.37%