What is the Chi-squared Test?
The Chi-squared test is a statistical hypothesis test used to determine whether there is a significant association between two categorical variables in a sample. It is based on comparing the observed frequencies in a contingency table with the expected frequencies that would occur if the variables were independent. The Chi-squared test is commonly used for feature selection in machine learning, as it can help identify the most relevant features for a given classification task.
Example of using the Chi-squared Test in Python
Here’s a simple example of performing a Chi-squared test using the
scipy library in Python:
import numpy as np from scipy.stats import chi2_contingency # Sample contingency table observed = np.array([[10, 20, 30], [20, 30, 20]]) # Perform the Chi-squared test chi2, p_value, dof, expected = chi2_contingency(observed) print("Chi-squared statistic:", chi2) print("P-value:", p_value) print("Degrees of freedom:", dof) print("Expected frequencies:", expected)
This example demonstrates how to use the
chi2_contingency function from the
scipy.stats module to perform a Chi-squared test on a sample contingency table.