ConvLab is an open-source multidomain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments.
As part of DSTC8, Microsoft Research and Tsinghua University are hosting a track intended to foster progress in building complex bots span over multiple sub-domains to accomplish a complex user goal. To advance state-of-the-art technologies for handling complex dialogs, we offer a timely task focusing on multi-domain end-to-end task completion dialog in travel planning settings.
Rank | Team Submission ID | Spec # | Success Rate | Language Understanding Score | Response Appropriateness Score | Turns |
---|---|---|---|---|---|---|
1 | 504430 | submission4 | 68.32% | 4.149 | 4.287 | 19.507 |
2 | 504429 | submission1 | 65.81% | 3.538 | 3.632 | 15.481 |
3 | 504563 | submission2 | 65.09% | 3.538 | 3.84 | 13.884 |
4 | 504651 | submission1 | 64.10% | 3.547 | 3.829 | 16.906 |
5 | 504641 | submission2 | 62.91% | 3.742 | 3.815 | 14.968 |
6 | 504569 | submission4 | 54.90% | 3.784 | 3.824 | 14.107 |
7 | 504529 | submission1 | 43.56% | 3.554 | 3.446 | 21.818 |
8 | 504582 | submission2 | 36.45% | 2.944 | 3.103 | 21.128 |
9 | 504666 | submission1 | 25.77% | 2.072 | 2.258 | 16.8 |
10 | 504502 | submission2 | 23.30% | 2.612 | 2.65 | 15.333 |
11 | 504524 | submission1 | 18.81% | 1.99 | 2.059 | 16.105 |
N/A | Baseline | milu_rule_rule_template | 56.45% | 3.097 | 3.556 | 17.543 |
Rank | Team Submission ID | Spec # | Success Rate | Return | Turns | Precision | Recall | F1 | Book Rate |
---|---|---|---|---|---|---|---|---|---|
1 | 504429 | submission1 | 88.80% | 61.56 | 7 | 0.92 | 0.96 | 0.93 | 93.75% |
2 | 504563 | submission4 | 88.60% | 61.63 | 6.69 | 0.83 | 0.94 | 0.87 | 96.39% |
3 | 504651 | submission1 | 82.20% | 54.09 | 6.55 | 0.71 | 0.92 | 0.78 | 94.56% |
4 | 504641 | submission4 | 80.60% | 51.51 | 7.21 | 0.78 | 0.89 | 0.81 | 86.45% |
5 | 504430 | submission1 | 79.40% | 49.69 | 7.59 | 0.8 | 0.89 | 0.83 | 87.02% |
6 | 504529 | submission1 | 58.00% | 23.7 | 7.9 | 0.61 | 0.73 | 0.64 | 75.71% |
7 | 504666 | submission1 | 56.60% | 20.14 | 9.78 | 0.68 | 0.77 | 0.7 | 58.63% |
8 | 504502 | submission1 | 55.20% | 17.18 | 11.06 | 0.73 | 0.74 | 0.71 | 71.87% |
9 | 504524 | submission1 | 54.00% | 17.15 | 9.65 | 0.66 | 0.76 | 0.69 | 72.42% |
10 | 504569 | submission4 | 52.20% | 15.81 | 8.83 | 0.46 | 0.75 | 0.54 | 76.38% |
11 | 504582 | submission2 | 34.80% | -6.39 | 10.15 | 0.65 | 0.75 | 0.68 | N/A |
12 | 504632 | submission1 | 0% | -58.88 | 20.88 | 0 | 0.01 | 0 | N/A |
N/A | Baseline | milu_rule_rule_template | 63.40% | 30.41 | 7.67 | 0.72 | 0.83 | 0.75 | 86.37% |