Skip to content Skip to sidebar Skip to footer

Python Implementation Of Logistic Regression As Regression (not Classification!)

I have a regression problem on which I want to use logistic regression - not logistic classification - because my target variables y are continuopus quantities between 0 and 1. How

Solution 1:

In statsmodels both GLM with family Binomial and discrete model Logit allow for a continuous target variable as long as the values are restricted to interval [0, 1].

Similarly, Poisson is very useful to model non-negative valued continuous data.

In these cases, the model is estimated by quasi maximum likelihood, QMLE, and not by MLE, because the distributional assumptions are not correct. Nevertheless, we can correctly (consistently) estimate the mean function. Inference needs to be based on misspecification robust standard errors which are available as fit option cov_type="HC0"

Here is a notebook with example https://www.statsmodels.org/dev/examples/notebooks/generated/quasibinomial.html

some issues with background for QMLE and fractional Logit https://www.github.com/statsmodels/statsmodels/issues/2040 QMLE https://github.com/statsmodels/statsmodels/issues/2712

Reference

Papke, L.E. and Wooldridge, J.M. (1996), Econometric methods for fractional response variables with an application to 401(k) plan participation rates. J. Appl. Econ., 11: 619-632. https://doi.org/10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1

update and Warning

as of statsmodels 0.12

Investigating this some more, I found that discrete Probit does not support continuous interval data. It uses a computational shortcut that assumes that the values of the dependent variable are either 0 or 1. However, it does not raise an exception in this case. https://github.com/statsmodels/statsmodels/issues/7210

Discrete Logit works correctly for continuous data with optimization method "newton". The loglikelihood function itself uses a similar computational shortcut as Probit, but not the derivatives and other parts of Logit.

GLM-Binomial is designed for interval data and has no problems with it. The only numerical precision problems are currently in the Hessian of the probit link that uses numerical derivatives and is not very precise, which means that the parameters are well estimated but standard error can have numerical noise in GLM-Probit.

update Two changes in statsmodels 0.12.2: Probit now raises exception if response is not integer valued, and GLM Binomial with Probit link uses improved derivatives for Hessian with precision now similar to discrete Probit.

Solution 2:

Excuse me if I'm not following but logistic classification is built on logistic regression (with the additional of some classification rule). I've used Sklearn before as well as statsmodels. A simple tutorial is provided here if you need it. In short, you can use either

import statsmodels.api as sm
mod = sm.Logit(y, x)
result = logit_model.fit()
result.summary()

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression
mod2 = log_reg.fit(x, y)  # assuming x and y are colums from a pandas dfprint(clf.coef_, clf.intercept_)

Post a Comment for "Python Implementation Of Logistic Regression As Regression (not Classification!)"