Evaluation metrics.

Systems participating in all subtasks will be ranked using the macro-F1 score. For the multilabel task, additional evaluation metrics include micro-F1, example-based F1, and Hamming loss. Macro-averaged metrics are computed by evaluating each label independently and averaging across labels, while micro-averaged metrics aggregate contributions across all labels. Hamming loss is reported to quantify the proportion of incorrectly predicted labels.

Final submission format

All the submissions must be in a .zip format. Inside the ZIP, the predictions must be in a CSV format and the order of the IDs must be in the same order as you downloaded. E.g.

For both tasks 1 and 2 the format is the same:

ID,label

0000, low

0001, physical

The name of the CSV for task 1 must be task1_predictions.csv.

The name of the CSV for task 2 must be task2_predictions.csv.

Submit a single ZIP file with a generic name, for example, predictions.zip. If you participate in only one task, the ZIP file will contain a single CSV file. If you participate in two tasks, you should send both prediction files (CSV) in the same ZIP file.