ML Tips

# Pandas builds on numpy arrays to provide rich data structures and data analysis tools.

# Pandas.DataFrame function provides labelled arrays of data, similar to the R’s “data.frame”

# The pandas.read_csv() function can be used to convert a comma-seperated values to the DataFrame object.

# Patsy is a Python library for describing statistical models and building Design Matrices using R-like formulas.

# To fit most of the models covered by statsmodels, we will need to create two design matrices. The firt is a matrix of endogenous variables. The second is a matrix of exogenous variables.

# The term regression was devised by Francis Galton in his article Regression Towards Mediocrity in Hereditary Stature in 1886. Galton described the biological phenomenon that the variance of height in a population does not increase over time. He observed that the height of parents is not passed on to their children but the children’s height is regressing towards the population mean

Frequency distribution in NLTK will help us to record the frequency of each word type in a document