User Tools

Site Tools


01_corpus:02_preprocessing:07_normalization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
01_corpus:04_annotations:03_normalization [2019/11/06 16:35] simone01_corpus:02_preprocessing:07_normalization [2020/04/17 11:16] simone
Line 1: Line 1:
-====== Normalization ====== +====== 1.2.7 Normalization ====== 
-Normalization is the task of "translating" non-standard language into standard language. It can be performed manually or automatically with computational linguistic tools.+Normalization is the task of "translating" non-standard language data into standard language. It can be performed manually or automatically with computational linguistics tools.
  
-In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW.+In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW (5 chats, 34,683 tokens).
  
-Another set of data was process automatically. You can read more about that project in: 
- 
-Ruzsics, Tatiana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019): Neural Text Normalization with Adapted Decoding and PoS Features. [[https://www.cambridge.org/core/journals/natural-language-engineering/article/neural-text-normalization-with-adapted-decoding-and-pos-features/474B380A32EF96CCED1708229848F3FB|Natural Language Engineering]]. 
- 
-This data will be made available soon. 
01_corpus/02_preprocessing/07_normalization.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki