These data are provided as appendix to the paper:
Johanna Gerlach, Pierrette Bouillon, Jonathan Mutal and Hervé Spechbach. 2024. A Concept Based Approach
for Translation of Medical Dialogues into Pictographs. [to be published in proceedings of
LREC-COLING 2024].
The corpus, referred to as HUG2 in the paper, consists of 380 medical utterances collected with the BabelDr application at the Geneva University Hospitals (HUG) during diagnostic interviews in triage settings. The original speech data were transcribed manually and each utterance was manually annotated with a reference gloss. Out of coverage items have no gloss and are indicated as such. Corresponding pictograph sequences are provided where available and correspond to the pictographs used in the second human evaluation described in the paper.
n.b. These data reflect the state of resources at the time of the experiments described in the paper and are not final project products. Please consult the PictoDr application or the PROPICTO project site for updated resources and information.
For each utterance, we provide:
The pictographs used are either ARASAAC or custom pictographs created by the authors for the PROPICTO project. The ARASAAC pictographic symbols used are the property of the Government of Aragón and have been created by Sergio Palao for ARASAAC, that distributes them under Creative Commons License BY-NC-SA. The custom pictographic symbols (some of which are derived from ARASAAC) are the property of FTI/TIM and are also distributed under Creative Commons License BY-NC-SA.