User Tools

Site Tools


01_corpus:01_subcorpora

This is an old revision of the document!


1.5 Sub-corpora

Based on the annotation of the languages per chat, the following sub-corpora were created:

  • WUS: All data, i.e. the whole corpus
  • WUS_DEU: All data where non-dialectal German provides the most messages
  • WUS_DEU_DEMOG: a subgroup thereof where we have the permission from all communication partners to use their texts.
  • WUS_FRA: All data where French provides the most messages
  • WUS_FRA_DEMOG: a subgroup thereof where we have the permission from all communication partners to use their texts.
  • WUS_GSW: All data where dialectal German provides the most messages
  • WUS_GSW_DEMOG: a subgroup thereof where we have the permission from all communication partners to use their texts.
  • WUS_ITA: All data where Italian provides the most messages
  • WUS_ITA_DEMOG: a subgroup thereof where we have the permission from all communication partners to use their texts.
  • WUS_ROH: All data where Romansh provides the most messages
  • WUS_ROH_DEMOG: a subgroup thereof where we have the permission from all communication partners to use their texts.
01_corpus/01_subcorpora.1572970825.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki