BLEU Score

BLEU Score

What is the BLEU Score?

The BLEU (Bilingual Evaluation Understudy) Score is an evaluation metric for machine translation that measures the quality of translated text by comparing it to human-translated reference texts. The BLEU Score is based on the modified n-gram precision, which counts the number of n-gram matches between the machine-translated output and the reference translations, taking into account both the closeness of the translation and the length of the translated text.

How is the BLEU Score calculated?

The BLEU Score is calculated using the following steps:

  1. Compute the modified n-gram precision for each n-gram order (usually up to 4-grams) by counting the number of n-grams in the machine-translated output that also appear in the reference translations, and normalize by the total number of n-grams in the output.
  2. Calculate the geometric mean of the modified n-gram precisions.
  3. Apply a brevity penalty to penalize shorter translations: If the output translation is shorter than the reference translations, the BLEU Score is multiplied by a factor that decreases as the output length decreases.
  4. The final BLEU Score ranges from 0 to 1, with higher scores indicating better translations.

Example of calculating BLEU Score in Python using NLTK:

from nltk.translate.bleu_score import sentence_bleu

# Example of reference translations and machine translation
reference_translations = [["the cat is on the mat", "there is a cat on the mat"]]
machine_translation = "the cat is on the mat"

# Calculate the BLEU Score
bleu_score = sentence_bleu(reference_translations, machine_translation.split())
print("BLEU Score:", bleu_score)

Resources on BLEU Score:

Byte Pair Encoding (BPE)

What is Byte Pair Encoding?

Byte Pair Encoding (BPE) is a data compression algorithm that iteratively replaces the most frequent pair of consecutive bytes with a single, unused byte. In natural language processing (NLP), BPE has been adapted as a subword tokenization method to handle out-of-vocabulary words effectively, by splitting them into smaller subword units. BPE-based tokenization has been used in various NLP models, such as OpenAI’s GPT-2 and Google’s BERT.

How does Byte Pair Encoding work in NLP?

In NLP, BPE tokenization is performed as follows:

  1. Initialize the vocabulary with unique characters in the dataset.
  2. Count the frequency of all character pairs in the dataset.
  3. Merge the most frequent pair, creating a new token.
  4. Repeat steps 2 and 3 for a predefined number of iterations or until no more merges are possible.

Example of Byte Pair Encoding in Python:

Here’s a simple implementation of BPE tokenization using Python:

from collections import Counter, defaultdict

def get_stats(vocab):
    pairs = Counter()
    for word, freq in vocab.items():
        symbols = word.split()
        for i in range(len(symbols) - 1):
            pairs[symbols[i], symbols[i + 1]] += freq
    return pairs

def merge_vocab(pair, vocab_in):
    vocab_out = {}
    bigram = ' '.join(pair)
    replacement = ''.join(pair)
    for word, freq in vocab_in.items():
        new_word = word.replace(bigram, replacement)
        vocab_out[new_word] = freq
    return vocab_out

def bpe_tokenization(vocab, num_merges):
    for i in range(num_merges):
        pairs = get_stats(vocab)
        if not pairs:
        best_pair = pairs.most_common(1)[0][0]
        vocab = merge_vocab(best_pair, vocab)
    return vocab

# Example usage
raw_vocab = {'l o w </w>': 5, 'l o w e r </w>': 2, 'n e w e s t </w>': 6, 'w i d e s t </w>': 3}
num_merges = 10
bpe_vocab = bpe_tokenization(raw_vocab, num_merges)

This example demonstrates a simple BPE tokenization process on a given vocabulary, performing 10 merge operations.

Additional resources on Byte Pair Encoding: