I used the word2vec tool and a small Bahasa text corpus (3MB) as input and produced the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.
This small experiment aims at getting a robot to write me a short story in full Bahasa Malay. I used a small text file of size 3.1MB, it has cleaned Malay text with complete sentences.
In this post I’m going to show how the word2vec (word embeddings model) encapsulates and delivers the contextual information without using any prior knowledge. I first encountered word2vec back in 2013 and had written a white paper for finding word similarities using a different type of neural network.