Documentation

What's up, Switzerland?

User Tools

Site Tools


01_corpus:02_preprocessing:07_normalization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
01_corpus:02_preprocessing:07_normalization [2020/04/16 16:36]
simone ↷ Page moved and renamed from 01_corpus:04_annotations:03_normalization to 01_corpus:02_preprocessing:07_normalization
01_corpus:02_preprocessing:07_normalization [2020/05/11 08:56] (current)
Line 1: Line 1:
-====== Normalization ====== +====== ​1.2.7 Normalization ====== 
-Normalization is the task of "​translating"​ non-standard language into standard language. It can be performed manually or automatically with computational ​linguistic ​tools.+Normalization is the task of "​translating"​ non-standard language ​data into standard language. It can be performed manually or automatically with computational ​linguistics ​tools.
  
-In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW.+In the case of our corpus, we have manually normalized some data in the Swiss German dialect, resulting in the corpus WUS_DIALOG_GSW ​(5 chats, 34,683 tokens).
  
-Another set of data was process automatically. You can read more about that project in: 
- 
-Ruzsics, Tatiana; Lusetti, Massimo; Göhring, Anne; Samardžić,​ Tanja; Stark, Elisabeth (2019): Neural Text Normalization with Adapted Decoding and PoS Features. [[https://​www.cambridge.org/​core/​journals/​natural-language-engineering/​article/​neural-text-normalization-with-adapted-decoding-and-pos-features/​474B380A32EF96CCED1708229848F3FB|Natural Language Engineering]]. 
- 
-This data will be made available soon. 
01_corpus/02_preprocessing/07_normalization.txt · Last modified: 2020/05/11 08:56 (external edit)