{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" }, "gpuClass": "standard", "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "source": [ "## Prepare your environment\n", "\n", "As always, we highly recommend that you use colab, or install all packages with a virtual environment manager, like [venv](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) or [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html), to prevent version conflicts of different packages. " ], "metadata": { "id": "CRPB-PDrhRAu" } }, { "cell_type": "markdown", "source": [ "### Install and load packages" ], "metadata": { "id": "hvbJMhCefPIz" } }, { "cell_type": "code", "source": [ "!pip install numpy scikit-learn datasets transformers torch sentencepiece tqdm jsonlines errant\n", "!python -m spacy download en" ], "metadata": { "id": "ng7XkVouhLew" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from datasets import load_dataset\n", "import torch\n", "import jsonlines\n", "from tqdm import tqdm\n", "import os" ], "metadata": { "id": "OSdZgl5Vks3n" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## The dataset\n", "We are using the W&I+LOCNESS dataset to finetune T5(-small). \n", "\n", "The W&I+LOCNESS dataset is made up of entries from \n", "- Write & Improve (Yannakoudakis et al., 2018), an online web platform that assists non-native English students with their writing, and \n", "- the LOCNESS corpus (Granger, 1998), which consists of essays written by native English students.\n", "\n", "([Details here](https://www.cl.cam.ac.uk/research/nl/bea2019st/) under the title \"Data\")" ], "metadata": { "id": "_rCm_nMBfg5z" } }, { "cell_type": "code", "source": [ "data_files = {'train': os.path.join('data', 'train.jsonl'),\\\n", " 'validation': os.path.join('data', 'dev.jsonl'),\\\n", " 'test': os.path.join('data', 'test.jsonl')}z\n", "dataset = load_dataset('json', data_files = data_files)" ], "metadata": { "id": "oXc9YgL3lJpH" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# have a look!\n", "print(len(dataset['train']['text']))\n", "print(dataset['train'][0])" ], "metadata": { "id": "8AWSSF7x9k7Y" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Pre-processing" ], "metadata": { "id": "Ox-VNFKm9e9X" } }, { "cell_type": "markdown", "source": [ "### Tokenizater\n", "Like our previous assignment on sentence classification, we need a tokenizer. This time, we're using T5's tokenizer.\n", "\n", "- [Tokenizer base class documentation](https://huggingface.co/docs/transformers/v4.24.0/en/main_classes/tokenizer#transformers.PreTrainedTokenizer) just FYI" ], "metadata": { "id": "K3vmY3GUfrsJ" } }, { "cell_type": "code", "source": [ "from transformers import AutoTokenizer" ], "metadata": { "id": "Zw1Mg3qb4iu_" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "MODEL_NAME = 't5-small'\n", "MODEL_MAX_LEN = 256" ], "metadata": { "id": "I0xXaLQh5EK7" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "tokenizer = AutoTokenizer.from_pretrained(\n", " MODEL_NAME,\n", " model_max_length=MODEL_MAX_LEN\n", " )" ], "metadata": { "id": "TEi0yOjo5gjQ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**[ TODO ]:** tokenize the input and output sequences. " ], "metadata": { "id": "jVnCzBb3Bo_1" } }, { "cell_type": "code", "source": [ "# batch-tokenize inputs \n", "def tokenize_batch(batch):\n", " \"\"\" Input: a batch of your dataset\n", " Example: { 'text': [['sentence1'], ['setence2'], ...],\n", " 'corrected': ['correct_sentence1', 'correct_sentence2', ...] }\n", " \"\"\"\n", " \n", " # encode the source sentence, i.e. the grammatically-incorrect sentences\n", " input_sequences = ...\n", " input_encoding = tokenizer(\n", " ...\n", " )\n", "\n", " input_ids, attention_mask = input_encoding.input_ids, \\\n", " input_encoding.attention_mask\n", "\n", " # encode the targets, i.e. the corrected sentences\n", " output_sequences = ...\n", " target_encoding = tokenizer(\n", " ...\n", " )\n", " labels = target_encoding.input_ids # we only need the token ids of the target sequences during training\n", "\n", " # replace padding token id's of the labels by -100 so it's ignored by the loss\n", " labels[labels == tokenizer.pad_token_id] = -100\n", " \n", " ################################################\n", "\n", " \"\"\" Output: a batch of processed dataset\n", " Example: { 'input_ids': ...,\n", " 'attention_masks': ...,\n", " 'label': ... }\n", " \"\"\"\n", " return {\"input_ids\": input_ids, \"attention_mask\": attention_mask, \"label\": labels}\n", " #loss = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).loss" ], "metadata": { "id": "xemZbP9ZzwDt" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "If you look up online tutorials of finetuning T5, you might see that a prefix is added before each sequence before encoding it in the step above. This is done because T5 was originally trained as a multi-task model. Adding a prefix is a way to let the model know which task it should perform with a given text during inference. \n", "\n", "We don't have to do that here**, because we are only finetuning T5 on a single task, i.e. GEC.\n", "\n", "\n", "For more information, please refer to this [tutorial](https://huggingface.co/docs/transformers/model_doc/t5#training)\n", "\n", "\n", "\n", "*** TA's note: In fact, appending a `gec:` prefix to the input sentences during training only worsen the model performance. \n", "I learned this the hard way...* 💔 " ], "metadata": { "id": "IPpjuGZI9vvS" } }, { "cell_type": "markdown", "source": [ "### Batch-tokenization" ], "metadata": { "id": "vjxNMAn0gZuO" } }, { "cell_type": "code", "source": [ "# map the function to the whole dataset\n", "train_val_dataset = dataset.map(tokenize_batch, # your processing function\n", " batched = True # Process in batches so it can be faster\n", " )" ], "metadata": { "id": "sgXO66x_5oFJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Training" ], "metadata": { "id": "muWZ4iS89qLK" } }, { "cell_type": "markdown", "source": [ "### Setup training parameters\n", "\n", "As before, we use the Trainer API to do the training. You may use the default hyperparameters that the TA has set for you.\n", "\n", "Document:\n", "- [transformers.Seq2SeqTrainingArguments](https://huggingface.co/docs/transformers/master/en/main_classes/trainer#transformers.Seq2SeqTrainingArguments)\n", "- [transformers.Seq2SeqTrainer](https://huggingface.co/docs/transformers/master/en/main_classes/trainer#transformers.Seq2SeqTrainer)" ], "metadata": { "id": "T57JOJ3zIYs_" } }, { "cell_type": "code", "source": [ "from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer" ], "metadata": { "id": "eFiPJ5jwIXLo" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "OUTPUT_DIR = './model'\n", "LEARNING_RATE = 2e-4\n", "BATCH_SIZE = 32\n", "EPOCH = 5\n", "training_args = Seq2SeqTrainingArguments(\n", " output_dir = OUTPUT_DIR,\n", " learning_rate = LEARNING_RATE,\n", " per_device_train_batch_size = BATCH_SIZE,\n", " per_device_eval_batch_size = BATCH_SIZE,\n", " num_train_epochs = EPOCH,\n", " # you can set more parameters here if you want\n", ")\n", "\n", "# now give all the information to a trainer\n", "trainer = Seq2SeqTrainer(\n", " # set your parameters here\n", " model = model,\n", " args = training_args,\n", " train_dataset = train_val_dataset[\"train\"],\n", " eval_dataset = train_val_dataset[\"validation\"],\n", " tokenizer = tokenizer,\n", ")" ], "metadata": { "id": "HwnlYdfWKY1V" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Train 🚀\n", "\n", "This is the easy part. Simply ask the trainer to train the model for you!" ], "metadata": { "id": "RhYRWQCMIdza" } }, { "cell_type": "code", "source": [ "from transformers import T5ForConditionalGeneration\n", "model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME)" ], "metadata": { "id": "O2YLnvHT5w9B" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "trainer.train()" ], "metadata": { "id": "YKrOtQRFIc_O" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Save for loading during demo\n", "\n", "**[ TODO ]:** save your model for future use and load it during demo\n", "\n", "We will ask you to perform inference with your model during demo!" ], "metadata": { "id": "3k4dW8wDIiOJ" } }, { "cell_type": "code", "source": [ "# [ TODO ] save your model for future use\n", "model.save_pretrained(...)" ], "metadata": { "id": "r4CIYw7zIlgU" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Prediction\n", "Inference is a bit different for a seq2seq model compared to a classification model. \n", "\n", "The model has to first generate the sequence from the given input sequence. How the sequence is generated differ based on the model. You may read more about it [here](https://huggingface.co/blog/encoder-decoder).\n", "\n", "Since the generated sequences are represented as vectors / embeddings, we need to *decode* the model output with the tokenizer." ], "metadata": { "id": "c9xbb4EbInrj" } }, { "cell_type": "code", "source": [ "# just to make sure you're using a GPU\n", "cur_device = torch.cuda.current_device()\n", "device = torch.device(cur_device)\n", "print(cur_device)" ], "metadata": { "id": "ciL3zrcHPLe9", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "6b318492-dda8-44a1-d95a-5500c98bbba8" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "0\n" ] } ] }, { "cell_type": "code", "source": [ "### Load finetuned model\n", "from transformers import ???\n", "\n", "model = ???.from_pretrained(...)\n", "model.to(device)" ], "metadata": { "id": "Rh-WAJuaIqKU" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Get the prediction\n", "\n", "Here are a few example sentences:" ], "metadata": { "id": "j3bh5rO6ItKO" } }, { "cell_type": "code", "source": [ "\n", "sentences = [\"The houses was wonderful.\", \"I like to working in NYC.\", \"She is involve in accident.\", ]\n", "\n", "inputs = tokenizer(sentences, return_tensors=\"pt\", padding=True)\n", "inputs.to(device)\n", "\n", "output_sequences = model.generate(\n", " input_ids=inputs[\"input_ids\"],\n", " attention_mask=inputs[\"attention_mask\"],\n", " do_sample=False,\n", ")\n", "\n", "print(tokenizer.batch_decode(output_sequences, skip_special_tokens=True))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yarsz1A8IvyA", "outputId": "87895009-3427-4ff3-c67b-a58619cfa0c8" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "['The houses were wanderful.', 'I like to work in NYC.', 'She is involved in an accident.']\n" ] } ] }, { "cell_type": "markdown", "source": [ "### Predict on the W&I+LOCNESS \"test\" set\n", "\n", "#### **A note on the \"test\" set**\n", "The \"test\" set here is not really the official test set, since the latter is witheld by the dataset provider to ensure fairness among all competitors for SOTA. \n", "Instead, the validation set you just used during training is split from the training set, and you'll be evaluating the model on the original validation set.\n", "\n" ], "metadata": { "id": "-RpYKFkqI7Us" } }, { "cell_type": "markdown", "source": [ "**[ TO DO ]:** use the model you just trained to turn the grammatically-incorrect test-set sentences into corrected sentences. This involves **loading** the test set, using the model to **generate** the numerically-represented transformed sentences (in tensor form), and **decoding** the generated tensors. \n", "\n", "You may reference the [🤗 documentation](https://huggingface.co/docs/transformers/model_doc/t5#inference) for what to put in the methods.\n", "\n", "\n", "\n", "[Optional] (you may do this just for the science; no bonus points will be given for this.) \n", "\n", "Store the output in the input `dataset` data structure to make life easier if you choose to iterate the output multiple times through the model (see next step for details). Otherwise, save it in any format you find convenient :)\n" ], "metadata": { "id": "R8rPHqO9AEAZ" } }, { "cell_type": "code", "source": [ "# load the test data\n", "test_data = load_dataset(\n", " ...\n", ")" ], "metadata": { "id": "0RqUm6WM4e6w" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Implement the functin to generate and batch decoode the generated sequence\n", "# batch-tokenize and then decode inputs \n", "@torch.no_grad()\n", "def decode_batch(batch):\n", " \"\"\" Input: a batch of an **untokenized** dataset\n", " Example: { 'text': ['sentence1', 'setence2', ...],\n", " 'corrected': ['correct_sentence1', 'correct_sentence2', ...] }\n", " \"\"\"\n", " sentences = tokenizer(batch[\"text\"], return_tensors=\"pt\", padding=True)\n", " sentences.to(device)\n", " \n", " output_sequences = model.???(\n", " ...\n", " )\n", "\n", " decoded = tokenizer.???(\n", " ...\n", " )\n", " \n", " return batch" ], "metadata": { "id": "kGlUVA7IkhgG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# adjust the arguments as you need\n", "decoded = test_data.map(\n", " decode_batch, # your processing function\n", " batched = True, # Process in batches so it can be faster\n", " batch_size=50\n", " )" ], "metadata": { "id": "2xO04LGCnjDH" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "assert len(decoded)==len(test_data)" ], "metadata": { "id": "Zu4Yey5TnBzh" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "[ Optional ] (no bonus points for this)\n", "\n", "Since some corrections in a sentence may depend on previous corrections, applying GEC sequence tagger only once may not be enough to fully correct the sentence. Thus, many methods suggest running the generated output of a GEC model through the system more than once. Try this on the test data!" ], "metadata": { "id": "8ge6su-KyzpZ" } }, { "cell_type": "code", "source": [ "iterations = 3\n", "pred_iter = [decoded] # you may want to save the output of each iteration\n", "\n", "for i in range(iterations):\n", " ..." ], "metadata": { "id": "GHOgk9Mn6fCd" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "[ Optional ] (no bonus points) Write a script to view the differences between each iteration, and view the results to decide which iteration you want to evaluate on. " ], "metadata": { "id": "IDZ0lZ6bz7Q0" } }, { "cell_type": "code", "source": [ "diffs = []\n", "for idx, it in enumerate(pred_iter):\n", " i = 0\n", " print(f\"ITERATION {idx}\")\n", " for txt_idx, text in enumerate(it['text']):\n", " ...\n", " print(f\"NUM CORRECTED SENTENCES: {i}\")" ], "metadata": { "id": "I8KWNiP6zAVD" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Evaluation\n", "#### ERRANT\n", "One evaluation metric for GEC is the ERRANT (see details under the title [Evaluation](https://www.cl.cam.ac.uk/research/nl/bea2019st/)), an improved version of the [MaxMatch scorer](https://github.com/nusnlp/m2scorer), where precision and recall are based on span and token match.\n", "\n", "The metric ERRANT uses is [F0.5](https://en.wikipedia.org/wiki/F-score#F%CE%B2_score), where precision is weighted twice the recall.\n", "\n", "We use [the ERRANT toolkit](https://github.com/chrisjbryant/errant) to evaluate our output. \n", "\n", "== a couple of terms ==\n", "- Sources = the sentences to be corrected\n", "- Hypotheses = the sentences we predicted, hopefully grammatically correct\n", "- References = the gold-standard sentences (i.e. sentences grammatically corrected by human annotators; the \"answer\")\n", "\n", "\n", "`errant_parallel` converts the span-based difference between the **sources** and the **hypotheses** into a `.m2` file.\n", "\n", "```\n", "S It 's difficult answer at the question \" what are you going to do in the future ? \" if the only one who has to know it is in two minds .\n", "A 3 3|||M:VERB:FORM|||to|||REQUIRED|||-NONE-|||0\n", "A 4 5|||U:PREP||||||REQUIRED|||-NONE-|||0\n", "```\n", "\n", "Next, `errant_compare` compares the above `.m2` file against the `.m2` file of a **reference** and the **source**, and calculates the F0.5 score. \n", "\n", "If you feel like you need to draw a graph to understanding this more clearly, it's okay. The TA did too." ], "metadata": { "id": "G_8v76PZ0LcO" } }, { "cell_type": "markdown", "source": [ "**[ TO DO ]** Use the ERRANT scorer to evaluate the test set and show the results\n", "...you don't really need to do anything here. Just run the code (perhaps change the outptu file name) and show the results during demo." ], "metadata": { "id": "1255Pk7DN6He" } }, { "cell_type": "code", "source": [ "!mkdir -p OUTPATH/OUTNAME.out # change the output file path and name as you wish" ], "metadata": { "id": "qHkz27v6KwQg" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "with open(\"OUTPATH/OUTNAME.out\", \"w\") as f:\n", " for line in pred_iter[1]['text']:\n", " f.write(line+\"\\n\")" ], "metadata": { "id": "eRHRwXAGKHSN" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# this will take about 1 minute\n", "!errant_parallel -ori data/source.txt -cor OUTPATH/OUTNAME.out -out OUTPATH/OUTNAME.m2" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KcUhFg_Q6zAy", "outputId": "2659dbd4-c9f6-4ecd-944e-10f8409d6068" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Loading resources...\n", "Processing parallel files...\n" ] } ] }, { "cell_type": "code", "source": [ "!errant_compare -ref data/bea-full-valid.m2 -hyp OUTPATH/OUTNAME.m2" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WT7GOZbwK-Yx", "outputId": "ed250072-dc8c-4a0e-ac24-035c9c1f0db5" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "=========== Span-Based Correction ============\n", "TP\tFP\tFN\tPrec\tRec\tF0.5\n", "1116\t10524\t6345\t0.0959\t0.1496\t0.1033\n", "==============================================\n", "\n" ] } ] }, { "cell_type": "markdown", "source": [ "You may notice that the scores are quite low. This is fine. (The TA got only ~0.1 F0.5 😅)\n", "\n", "This (finetuning T5 by feeding it parallel text) is only one of the [many methods](https://nlpprogress.com/english/grammatical_error_correction.html) developed in attempt to solve GEC. Other contributing factors for this include:\n", "- this T5 is tiny (60m parameters) compared to other T5s (e.g. T5-base: 220m, T5-large: 770m);\n", "- the training data is quite small. Studies usually combine many other datasets, such as Lang-8 (947k sentences), FEC (34k sentences), and perform data augmentation.\n", "\n", "**What other factors can you think of that contribute to the low performance?**\n", "\n", "**[ TO DO (Optional, with bonus points) ]:** Think of one factor that might affect the model performance, and \n", "\n", "(a) come up with 1 way to verify that via experimentation (you don't need to actually do the experiment), OR \n", "\n", "(b) find 1 paper to prove your hypothesis. We will ask you to show the paper and point out the part where it proves your hypothesis." ], "metadata": { "id": "LIAODr-340zM" } }, { "cell_type": "markdown", "source": [ "----------\n", "**[ TO DO ]** Plot a histogram of the `.m2` file you generated.\n", "\n", "You now have the gold standard `.m2` file and the `.m2` file from the corrections you generated from T5. \n", "\n", "In the `.m2` files, lines containing items seperated by 3 pipes (|||) are the correction types (and spans for their corresponding words or phrases) generated by comparing the sentences from the source file and the source/hypotheses file. \n", "\n", "For example, for `A 3 3|||M:VERB:FORM|||to|||REQUIRED|||-NONE-|||0`, `M:VERB:FORM` is the correction type.\n", "\n", "Below is a bar plot (histogram) of the correction types from the gold-standard `.m2` file (i.e.`bea-full-valid.m2`). Plot **a histogram/bar plot for the top-10 correction types** of the `.m2` file generated by **the source and the hypothesis**. How are the top-10 corrections different from the gold-standard ones?\n", "![fqLtmi0.png]()" ], "metadata": { "id": "zr1xZuNgK7ky" } }, { "cell_type": "markdown", "source": [ "### Evaluation: Your turn\n", "You have two main tasks:\n", "1. **[TO DO]** Evaluate the model we just trained with ERRANT on another dataset: [JFLEG](https://github.com/keisks/jfleg). (Do you get a higher or lower score? Explain.)\n", "2. **[TO DO]** GLEU is another metric, originally developed to evaluate machine translation, that is used to evaluate GEC. Use GLEU to evaluate your model on **both** W&I+LOCNESS and JFLEG. \n", "\n", "You may reference [this](https://www.nltk.org/api/nltk.translate.gleu_score.html) or [this](https://github.com/keisks/jfleg/blob/master/eval/gleu.py) to calculate the GLEU scores. " ], "metadata": { "id": "0y_pRWKb2aUh" } }, { "cell_type": "markdown", "source": [ "### Evaluate on JFLEG with ERRANT\n", "Note: For JFLEG, you need to generate your own gold-standard `.m2` file. Use `errant_parallel` like you did for generating the source-hypothesis `.m2` file.\n", "\n", "JFLEG provides multiple reference files. For simplicity, just use `dev.ref0` as the reference file." ], "metadata": { "id": "sG0TROElO2bU" } }, { "cell_type": "code", "source": [ "jf_test_data = load_dataset(\n", " # load the jfleg dev set\n", " )" ], "metadata": { "id": "KIjGYjzPPEIz" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "jf_decoded = jf_test_data.map(\n", " # ...\n", " )" ], "metadata": { "id": "ZGh5tytOPSOL" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# write to file for evaluation\n", "with open(\"HYPOTHESIS.FILE\", \"w\") as f:\n", " # ..." ], "metadata": { "id": "Tf8fZqwsQThG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# ERRANT-evaluate JFLEG\n", "!errant_parallel -ori data/jfleg/source.txt -cor HYPOTHESIS.FILE -out OUTPATH/OUTNAME.m2\n", "!errant_parallel -ori data/jfleg/source.txt -cor data/jfleg/dev.ref0 -out outputs/jfleg/ref0.m2\n", "!errant_compare -ref outputs/jfleg/ref0.m2 -hyp OUTPATH/OUTNAME.m2" ], "metadata": { "id": "4cdqAdoFRS6b" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### GLEU evaluation for WI+LOC and JFLEG\n", "Note: the GLEU calculator may offter sentence-based GLEU scores and the mean GLEU score. You only need to obtain the mean GLEU score for each dataset." ], "metadata": { "id": "wN_3I-WtR_mw" } }, { "cell_type": "code", "source": [], "metadata": { "id": "8VeKrF2ZBgxT" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**[ TO DO (Optional, with bonus points) ]** Is GLEU or ERRANT higher? Which of these is a better measure of grammaticity? Why?" ], "metadata": { "id": "3n6T3jEjI0lE" } }, { "cell_type": "markdown", "source": [ "## TA's Note\n", "\n", "Phew, you made it to the end of the tutorial! Make sure you make an appointment to show your work and turn in your finished assignment before next week's lesson. Don't worry if you didn't pass the evaluation requirements, you'll still get partial points for trying. \n", "\n", "Grading:\n", "- Optional TODOs are bonus points. \n", "- You will earn 90 points from completing the non-optional TODOs." ], "metadata": { "id": "cCduvhDfJF8u" } } ] }