User Tools

Site Tools


02_browsing:04_queries:05_fields

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
02_browsing:04_queries:04_fields [2020/04/17 21:41] simone02_browsing:04_queries:05_fields [2022/06/27 09:21] (current) – external edit 127.0.0.1
Line 1: Line 1:
-====== Fields available ====== +====== 2.4.5 Fields available ====== 
-Below you find all field that can be queried in the corpus in four categories depending on whether they relate to the chat, to the message, to the token or to the demographic meta data.+Below you find all fields that can be queried in the corpus in four categories depending on whether they relate to the chat, to the message, to the token or to the demographic meta data.
  
 Please do not forget the [[01_corpus:02_preprocessing:06_pos|Part of Speech annotations per language]]. Please do not forget the [[01_corpus:02_preprocessing:06_pos|Part of Speech annotations per language]].
Line 22: Line 22:
 |lang_100_and_more |node & meta::lang_100_and_more="deu, gsw"|Find all messages in chats with more than 100 messages in non-dialectal or dialectal German. The same query can be applied for fewer or more languages by separating them with commas as shown in the example. Other languages are fra and ita for French and Italian respectively as well as roh for Romansh.| |lang_100_and_more |node & meta::lang_100_and_more="deu, gsw"|Find all messages in chats with more than 100 messages in non-dialectal or dialectal German. The same query can be applied for fewer or more languages by separating them with commas as shown in the example. Other languages are fra and ita for French and Italian respectively as well as roh for Romansh.|
 |lang_less_than_100 |node & meta::lang_less_than_100="roh"|Find all messages in chats with more than 100 messages in Romansh. The same query can be applied for fewer or more languages by separating them with commas as shown in the example. Other languages are fra and ita for French and Italian respectively as well as gsw for dialectal German as well as deu for non-dialectal German| |lang_less_than_100 |node & meta::lang_less_than_100="roh"|Find all messages in chats with more than 100 messages in Romansh. The same query can be applied for fewer or more languages by separating them with commas as shown in the example. Other languages are fra and ita for French and Italian respectively as well as gsw for dialectal German as well as deu for non-dialectal German|
-|no_consent_msg |node & meta::no_consent_msg="54"|Find all messages in chats with more exactly 54 messages without consent to be used|+|no_consent_msg |node & meta::no_consent_msg="54"|Find all messages in chats with exactly 54 messages without consent to be used|
 |speakers |node & meta::speakers="2"| Find all messages in chats with exactly two speakers regardless of whether we have their permission or not. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.| |speakers |node & meta::speakers="2"| Find all messages in chats with exactly two speakers regardless of whether we have their permission or not. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.|
 |total_msg |node & meta::total_msg="2443"|Find all messages in chats with exactly 2443 messages. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.| |total_msg |node & meta::total_msg="2443"|Find all messages in chats with exactly 2443 messages. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.|
Line 45: Line 45:
 |spk |spk="spk2963"|Find messages written by a specific informant. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.| |spk |spk="spk2963"|Find messages written by a specific informant. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.|
 |timestamp |timestamp="14 Jan à 13:52"|Find messages with a specific time stamp. Please keep in mind that the timestamp depends on the language used by the informant. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.| |timestamp |timestamp="14 Jan à 13:52"|Find messages with a specific time stamp. Please keep in mind that the timestamp depends on the language used by the informant. This is an alphanumeric field, i.e. you cannot search for larger than or smaller than.|
-|most_likely_lang |most_likely_lang="gsw"|Find messages which were annotated either by humans or by a computational linguistics tool as being most likely in a specific [[01_corpus:03_preprocessing:05_language_per_message|specific language]].|+|most_likely_lang |most_likely_lang="gsw"|Find messages which were annotated either by humans or by a computational linguistics tool as being most likely in a specific [[01_corpus:02_preprocessing:04_languages| language]].|
  
 ===== Token annotations ===== ===== Token annotations =====
Line 57: Line 57:
  
 ===== Demographic annotation ===== ===== Demographic annotation =====
-  * Demographic information is attached to messages. The information thus is redundant, because it is attached to each and every message written by a specific informant.  +  * Demographic information is attached to every message written by a specific informant.  
-  * Keep in mind that some answers available in the [[01_corpus:03_demographics|questionnaire]] might be missing here. We only list answers that were actually given. +  * Some postal codes, cantons and cities are marked with an asterik in cases where we looked them up in lists. For exampleif communication partner left the field for the city and the canton empty but gave his postal code as 4144, we added the city as //*Arlesheim// and the canton as //*BL//
-  * You might find postal codes, cantons and cities with an preceeding asterisk for values that were not actually given by the user but generated by us. For very precise evaluationyou have to be carefull with these generated values, since a postal code is not always unique in Switzerland (e.g. 8127 belonging to a part of Küsnacht and a part of Maur) and neither are cities (Zurich having many postal codes)+  * Also keep in mind that answers can be multiple, i.e. somebody can give "gsw, fra, ita" as their mothertongue if they are trilingual.
-  * Also keep in mind that answers can be combined, i.e. somebody can give "gsw, fra, ita" as their mothertongue if they are trilingual.+
   * To see the corresponding questions in all languages, please check the [[01_corpus:03_demographics|questionnaire]].   * To see the corresponding questions in all languages, please check the [[01_corpus:03_demographics|questionnaire]].
  
Line 67: Line 66:
 |education |education="secondary school qualification"|university or polytechnic diploma, still in education, secondary school qualification, no indication, higher vocational education | |education |education="secondary school qualification"|university or polytechnic diploma, still in education, secondary school qualification, no indication, higher vocational education |
 |flatrate |flatrate="yes"|yes, no| |flatrate |flatrate="yes"|yes, no|
-|features |features="abbreviations,non standard,smileys,dialect,multiple languages"|non standard, multiple languages, smileys, dialect+|features |features="abbreviations,non standard,smileys,dialect,multiple languages"|non standard, multiple languages, smileys, dialect|
 |gender |gender="unknown"|m, f| |gender |gender="unknown"|m, f|
 |home_country |home_country="CH"|AT (Austria), CA (Canada), CH (Switzerland, CZ (Czech Republic), DE (Germany), FI (Finnland), FR (France), HN (Honduras), IT (Italy), LU (Luxembourg), PL (Poland)| |home_country |home_country="CH"|AT (Austria), CA (Canada), CH (Switzerland, CZ (Czech Republic), DE (Germany), FI (Finnland), FR (France), HN (Honduras), IT (Italy), LU (Luxembourg), PL (Poland)|
-|home_postcode |home_postcode="1004"||+|home_postcode |home_postcode="1004"| |
 |homelanguage |homelanguage="gsw"|deu (non-dialectal German, fra (French), eng (English), ita (Italian), roh (Romansh), und (undefined), frp (Francoprovençal), lmo (Lombard), gsw (dialectal German)| |homelanguage |homelanguage="gsw"|deu (non-dialectal German, fra (French), eng (English), ita (Italian), roh (Romansh), und (undefined), frp (Francoprovençal), lmo (Lombard), gsw (dialectal German)|
 |input_method |input_method="without correction"|with correction, with prediction, without correction, without prediction| |input_method |input_method="without correction"|with correction, with prediction, without correction, without prediction|
02_browsing/04_queries/05_fields.1587152476.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki