Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction

Empirical Methods in Natural Language Processing
Conference Paper
Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, high-dimensional vectors represent linguistic entities. In an application, the similarity of vectors---and thus the entities that they represent---is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem.This paper introduces a novel technique called Random Manhattan Indexing (RMI) for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental and thus scalable two-step procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections. We further introduce Random Manhattan Integer Indexing (RMII): a computationally enhanced version of RMI. As shown in the reported experiments, RMI and RMII can be used reliably to estimate the L1 distances between vectors in a vector space of low dimensionality.
Research Unit: 
Knowledge Discovery Unit
Líon 2