Non-negative Matrix Factorization (NMF)

Non-negative Matrix Factorization (NMF)

Non-negative Matrix Factorization (NMF) is a dimensionality reduction and data analysis technique that decomposes a non-negative matrix into two lower-dimensional non-negative matrices, approximating the original data with a smaller number of latent features. NMF is particularly useful in applications such as image processing, text mining, and recommendation systems, where the data can be represented by non-negative values.


In this example, we’ll demonstrate how to use NMF for topic extraction from a collection of documents using the scikit-learn library.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

documents = [
    "The quick brown fox jumps over the lazy dog.",
    "I enjoy reading about machine learning and natural language processing.",
    "The weather is sunny today, perfect for a walk in the park.",
    "Deep learning is a popular subfield of machine learning."

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)

nmf = NMF(n_components=2, random_state=42)
W = nmf.fit_transform(X)
H = nmf.components_

# Print the topics and their top words
for i, topic in enumerate(H):
    print(f"Topic {i + 1}:")
    print(" ".join([vectorizer.get_feature_names()[index] for index in topic.argsort()[-5:]]))


Topic 1:
park walk weather sunny today
Topic 2:
language natural processing learning machine