User Tools

Site Tools


02_browsing:05_additional:02_export

This is an old revision of the document!


Export

After performing a query, you can click on "More" and then "Export" to export your results. As you can see from figure 1, you have different formats available.

Figure 1: Export options

Next to the type of export, you have the option "Left and right context", which is the same for all export formats. Here, you can define the number of entities to be exported to the left or right of of your search query. The entity is in the same unit as your query, i.e. if you query for tokens, you can select the number of tokens to be shown, while if you query for messages, this is the number of messages.

The other options, "Annotation keys" and "Parameters" depend on the export format and are explained to the right when you select an export option.

Once you click "Perform Export", the system will create the export in the memory and you can click "download" to have it downloaded to your own computer.

Exports are very hungry in resources, thus, it might take a while to create an export or the server might even hang. The simpler your query, the less problems you have. Tip: instead of formulating a complex RegEx query, it might be more useful to create several simpler queries and then add the resulting files together.

WekaExporter

This exporter is very specific for the data mining application Weka. If you are familiar with Weka, this is a good option for you.

CSVExporter

This exporter creates one line per result. In this line, you see the text you queried for as well as all the annotations available on the token level. Depending on the sub-corpus, these are the token itself as well as PoS annotations.

The field "Annotation Keys" is not used in this export.

Under "Parameters", you can add annotations that pertain to the chat. More precisely, you can add all annotations that are listed under "Meta Annotations" in the information display per sub-corpus. To list that kind of information, you use the form: metakeys=doc to display the chat ID. More values can be added with commas.

TokenExporter

This exporter is intended for smaller corpora than ours. It normally hangs even at very small queries. We recommend not to use it.

GridExporter

This exporter offers is the most versatile one, since you can chose the annotations that you want to export. Figure 2 shows an example, in which one token to the left and one to the right are exported as well as the whoel message, the message ID, the token queried for and the age_range (not visible). Additionally, the metakey for the chat ID is exported as explained above.

Figure 2: Example of a GridExport

The resulting output starts as follows:

Figure 3: Results of the export (extract)

As you can see in figure 3, each result is preceded by a number starting with 0. You then see all the annotation keys selected in figure 2 in the selected order: whole message, message ID, token (your query is in the center, in this case demain plus the left and right token that you selected with the left and right context), age_range and then the chat ID selected with metakeys=doc.

02_browsing/05_additional/02_export.1575450976.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki