GloVe means Global Vectors for Word Representation. The authors provide pre-trained word vectors models learned on such collections as: Wikipedia + Gigaword, Common Crawl or Twitter. In this article, I’m showing my way to convert GloVe models to KeyedVectors used in Gensim.

# Imports
from gensim.test.utils import get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
# Temporary file
tmp_file = get_tmpfile('temp_word2vec.txt')

# GloVe vectors loading function into temporary file
glove2word2vec('glove.6B.300d.txt', tmp_file)
# Creating a KeyedVectors from a temporary file
model = KeyedVectors.load_word2vec_format(tmp_file)

# Optional saving of vectors to binary format
model.save_word2vec_format('keyed-6B-300.bin.gz', binary=True)

The following table compares the size of files on the disk: GloVe (.txt) and KeyedVectors (.bin.gz):

File Size
glove.6B.300d.txt 990 MB
keyed-6B-300.bin.gz 423 MB

For example, KeyedVectors can be used to calculate the cosine similarity of words. The result of the function model.most_similar('boat', topn=5):

No. Word Similarity
0 ‘boats’ 0.8481258153915405
1 ‘vessel’ 0.7043675184249878
2 ‘ship’ 0.6790910959243774
3 ‘yacht’ 0.6422115564346313
4 ‘capsized’ 0.632569432258606