01_corpus:02_preprocessing
This is an old revision of the document!
Pre-processing
After collecting the data, we had around 650 chats in different languages but no idea which chat was in which language. Furthermore, we had given a promise to anonymize the data and we did not have a tool to browse the data in the available format. Thus, before making the data available to the research team, we had to pre-process them. We thus had to perform some steps before our research could start.
01_corpus/02_preprocessing.1572435516.txt.gz · Last modified: 2022/06/27 09:21 (external edit)