samedi 2 septembre 2017

Keras: utiliser son propre le jeu de données (prix du logement, boston)

On va créer un dossier "data" pour le jeu de données.

Créer un dossier "boston" dedans:


Télécharger le jeu de données (csv) fichier d'ici: https://github.com/shunakanishi/keras_boston_dataset




Déplacer "housing.csv" vers le dossier "boston":


Les données dedans représente ça:
CRIMZNINDUSCHASNOXRMAGEDISRAD
0.00632182.3100.5386.57565.24.09001
0.0273107.0700.4696.42178.94.96712
0.0272907.0700.4697.18561.14.96712
0.0323702.1800.4586.99845.86.06223
0.0690502.1800.4587.14754.26.06223
TAXPTRATIOBLSTATMEDV
29615.3396.904.9824.0
24217.8396.909.1421.6
24217.8392.834.0334.7
22218.7394.632.9433.4
22218.7396.905.3336.2

Maintenant créer "boston.py" dans le dossier partagé et écrire dedans:


import pandas as pd
import numpy as np

# Read dataset into X and Y
df = pd.read_csv('./data/boston/housing.csv', delim_whitespace=True, header=None)
dataset = df.values

X = dataset[:, 0:13]
Y = dataset[:, 13]

# Define the neural network
from keras.models import Sequential
from keras.layers import Dense

def build_nn():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='relu', kernel_initializer="normal"))
    # No activation needed in output layer (because regression)
    model.add(Dense(1, kernel_initializer="normal"))

    # Compile Model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

# Evaluate model (kFold cross validation)
from keras.wrappers.scikit_learn import KerasRegressor

# sklearn imports:
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Before feeding the i/p into neural-network, standardise the dataset because all input variables vary in their scales
estimators = []
estimators.append(('standardise', StandardScaler()))
estimators.append(('multiLayerPerceptron', KerasRegressor(build_fn=build_nn, nb_epoch=100, batch_size=32, verbose=1)))

pipeline = Pipeline(estimators)

kfold = KFold(n=len(X), n_folds=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)

print ("");
print ("Mean: ", results.mean())
print ("StdDev: ", results.std())

Faire ce commande:
$ sudo python3.5 boston.py

L'apprentissage profond va démarrer avec les données de housing.csv:
Moyenne: 478.48
Écart-type: 258.5499

Aucun commentaire:

Enregistrer un commentaire