Skip to content Skip to sidebar Skip to footer

Creating Confusion Matrix From Multiple .csv Files

I have a lot of .csv files with the following format. 338,800 338,550 339,670 340,600 327,500 301,430 299,350 284,339 284,338 283,335 283,330 283,310 282,310 282,300 282,300 283

Solution 1:

Add a def get_predict(filename)

def get_predict(filename):
    if'Alex'in filename:
        return'Alexander'else:
        return filename [0]

Reading n files, compute confusion matrix using pandas crosstab:

import os
import pandas as pd

defget_category(filepath):
    defcategory(val):
        print('predict({}; abs({})'.format(val, abs(val)))
        if0.8 < val <= 0.9:
            return"A"ifabs(val - 0.7) < 1e-10:
            return"B"if0.5 < val < 0.7:
            return"C"ifabs(val - 0.5) < 1e-10:
            return"E"return"D"withopen(filepath, "r") as csvfile:
        ff = csv.reader(csvfile)

        results = []
        previous_value = 0for col1, col2 in ff:
            value = int(col1)
            if value >= previous_value:
                previous_value = value
            else:
                results.append(value / previous_value)
                previous_value = value

    return category(sum(results) / len(results))

matrix = {'actual':[], 'predict':[]}
path = 'test/confusion'for filename in os.listdir( path ):
    # The first Char in filename is Predict Key
    matrix['predict'].append(filename[0])
    matrix['actual'].append(get_category(os.path.join(path, filename)))

df = pd.crosstab(pd.Series(matrix['actual'], name='Actual'),
                 pd.Series(matrix['predict'], name='Predicted')
                 )
print(df)

Output: (Reading "A.csv, B.csv, C.csv" with the given example Data three times)

Predicted  AB  C
Actual            
A300B030
C          003

Tested with Python:3.4.2 - pandas:0.19.2

Solution 2:

Using Scikit-Learn is the best option to go for in your case as it provides a confusion_matrix function. Here is an approach you can easily extend.

from sklearn.metrics import confusion_matrix

# Read your csv fileswithopen('A1.csv', 'r') as readFile:
    true_values = [int(ff) for ff in readFile]
withopen('B1.csv', 'r') as readFile:
    predictions = [int(ff) for ff in readFile]

# Produce the confusion matrix
confusionMatrix = confusion_matrix(true_values, predictions)

print(confusionMatrix)

This is the output you would expect.

[[0 2]
 [0 2]]

For more hint - check out the following link:

How to write a confusion matrix in Python?

Post a Comment for "Creating Confusion Matrix From Multiple .csv Files"