Skip to content Skip to sidebar Skip to footer

Is There A Way To Suitably Adjust This Sklearn Logistic Regression Function To Account For Multiple Independent Variables And Fixed Effects?

I would like to adapt the LogitRegression function included below to include additional independent variables and fixed effects. The code below has been adapted from the answer pro

Solution 1:

I managed to solve this using the following small adjustments when passing the independent and fixed effect variables to the function using a dataframe (writing out a simplified example of the problem helped me a lot in finding the answer):

from sklearn.linear_model import LinearRegression
from random import choices
from string import ascii_lowercase
import numpy as np
import pandas as pd

class LogitRegression(LinearRegression):

    def fit(self, x, p):
        p = np.asarray(p)
        y = np.log(p / (1 - p))
        return super().fit(x, y)

    def predict(self, x):
        y = super().predict(x)
        return 1 / (np.exp(-y) + 1)
    

if __name__ == '__main__':
    
    # generate example data
    np.random.seed(42)
    n = 100
    
    x = np.random.randn(n).reshape(-1,1)
    
    # defining the predictor (dependent) variable (a proportional value between 0 and 1)
    noise = 0.1 * np.random.randn(n).reshape(-1, 1)
    p = np.tanh(x + noise) / 2 + 0.5
    
    # creating 3 random independent variables
    x1 = np.random.randn(n)
    x2 = np.random.randn(n)
    x3 = np.random.randn(n)
    
    # a fixed effects variable
    cats = ["".join(choices(["France","Norway","Ireland"])) for _ in range(100)]

    # combining these into a dataframe
    df = pd.DataFrame({"x1":x1,"x2":x2,"x3":x3,"countries":cats})

    # adding the fixed effects country columns
    df = pd.concat([df,pd.get_dummies(df.countries)],axis=1)
                 
    print(df)

    # Using the independent variables x1,x2,x3 and the fixed effects column, countries, from the above df. The dependent variable is a proportion.
    # x = np.array(df)
    categories = df['countries'].unique()
    x = df.loc[:,np.concatenate((["x1","x2","x3"],categories))]
    
    model = LogitRegression()
    model.fit(x, p) 


Post a Comment for "Is There A Way To Suitably Adjust This Sklearn Logistic Regression Function To Account For Multiple Independent Variables And Fixed Effects?"