User Tools

Site Tools


01_corpus:01_subcorpora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
01_corpus:01_subcorpora [2020/04/16 11:34] simone01_corpus:01_subcorpora [2022/06/27 09:21] (current) – external edit 127.0.0.1
Line 8: Line 8:
   * Each chat was to be assigned to only one language-sub-corpus.    * Each chat was to be assigned to only one language-sub-corpus. 
   * Additionally, we differentiate between chats where we have demographic information for all participants and those where we do not. In the former case, the sub-corpus gets the extension _DEMOG.   * Additionally, we differentiate between chats where we have demographic information for all participants and those where we do not. In the former case, the sub-corpus gets the extension _DEMOG.
-  * Where additional tasks were performed on individual chats (e.g. normalization or part-of-speech tagging) we created additional sub-corpora exist per language.+  * Where additional tasks were performed on individual chats (e.g. normalization or part-of-speech tagging) we created additional sub-corpora per language.
  
  
Line 25: Line 25:
   * WUS_ROH_DEMOG: A subgroup thereof where we have demographic information from all communication partners.   * WUS_ROH_DEMOG: A subgroup thereof where we have demographic information from all communication partners.
  
 +Additionally to these corpora, you also see corpora with lowercase letters in the browser (e.g. deu-rftagged, ita-tagged, roh etc.). These corpora contain data from our [[https://wiki.linguistik.uzh.ch/sms4science|SMS project]].
 ===== Smaller corpora ===== ===== Smaller corpora =====
  
Line 32: Line 32:
   * WUS_SMALL_DEMOG: A subgroup thereof where we have demographic information from all communication partners.   * WUS_SMALL_DEMOG: A subgroup thereof where we have demographic information from all communication partners.
   * WUSdemographics: Only demographic data per person. This sub-corpus is much faster if you want to look up demographic data only.   * WUSdemographics: Only demographic data per person. This sub-corpus is much faster if you want to look up demographic data only.
-  * WUS_ARGDROP and WUS_ARGDROP_language: Sub-corporafor which argument drop has been manually annotated. For the architecture of the annotations and scientific considerations behind it see [[http://www.unige.ch/lettres/linge/syntaxe/journal/Volume11/11_Stuntebeck_2018.pdf|Stuntebeck, Franziska (2018): "Annotating Argument Drop in the Swiss WhatsApp Corpus". In: Generative Grammar in Geneva (GG@G) XI, 175-187.]]+  * WUS_ARGDROP and WUS_ARGDROP_language: Sub-corpora for which argument drop has been manually annotated. For the architecture of the annotations and scientific considerations behind it see [[http://www.unige.ch/lettres/linge/syntaxe/journal/Volume11/11_Stuntebeck_2018.pdf|Stuntebeck, Franziska (2018): "Annotating Argument Drop in the Swiss WhatsApp Corpus". In: Generative Grammar in Geneva (GG@G) XI, 175-187.]] 
  
 +===== Other corpora in the browsing tool =====
 +Additionally to these corpora, you also see corpora with lowercase letters in the browser (e.g. deu-rftagged, ita-tagged, roh etc.). These corpora contain data from our [[https://wiki.linguistik.uzh.ch/sms4science|SMS project]].
  
 ===== More information about the subcorpora ===== ===== More information about the subcorpora =====
01_corpus/01_subcorpora.1587029674.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki