Data Preparation

In this post, we are going to learning about some operations required for us to do data preparation.

Lets say, df – say is a┬ápandas data frame of train.csv

  • df.dtypes()
  • df.dtypes[df.dtypes == ‘O’]
  • Limit to numeral data – enough?

# Limit to numeric data
df = df._get_numeric_data()

  • Label Encoder
for column in X:
  le = LabelEncoder()
  X[column] = le.fit_transform(X[column])
  • OneHotEncoder
enc = OneHotEncoder()
Xt = enc.fit_transform(X)
  • Data splitting
from sklearn.cross_validation import train_test_split

features_train, features_test, labels_train, labels_test = train_test_split(features, labels,test_size=0.4, random_state=0)







Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s