A Simple Guide to creating Predictive Models in Python, Part-2b

Therefore the below method is easier and scalable# first just take a look at all the columnslist(deep_feat.columns)Output:['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']Make a list of columns (excluding ‘Exited’) where the number of unique elements is 2 (i.e., 0 or 1) or if the data type is ‘object’ represented by ‘O’ and store them as categorical_columnscategorical_columns = [col for col in deep_feat.columns if len(deep_feat[col].unique())==2 or deep_feat[col].dtype=='O']Make a list of columns (excluding ‘Exited’) where the number of unique elements is greater than 2 and the data type is either ‘int64’ or ‘float64’ and store them as continuous_columnscontinuous_columns = [col for col in deep_feat.columns if len(deep_feat[col].unique())>2 and (deep_feat[col].dtype=='int64' or deep_feat[col].dtype=='float64')]See what the lists look likeprint("categorical columns : ", categorical_columns)print("continuous columns : ", continuous_columns)Output:categorical columns : ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']continuous columns : ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']We included the ‘Age’ column in continuous_columns for now but will later bucketize it (using tensorflow API) and change it into a categorical column.Remember all the hard work we did previously?.well, we don’t have to do it anymore because tensorflow will take care of everything# making a train test splitX_T, X_t, y_T, y_t = train_test_split(deep_feat, deep_label, test_size=0.3)Lets now scale the dataFirst, create a list of columns to scale which has all the columns in continuous_columns except ‘Age’ because we want to convert the ‘Age’ column into bucketized columncols_to_scale = continuous_columns[:]cols_to_scale.remove("Age")apply scaling# scaling the listed columnsscaler = StandardScaler()X_T.loc[:,cols_to_scale] = scaler.fit_transform(X_T.loc[:,cols_to_scale])X_t.loc[:,cols_to_scale] = scaler.fit_transform(X_t.loc[:,cols_to_scale])The below code is a little confusing at first but let's break it downWe are basically creating feature columns in tensorflow corresponding to each of the columns in the dataframe“tf.feature_column.categorical_column_with_hash_bucket()” takes in a categorical column like ‘Gender’ and applies one hot encoding“Hash_bucket_size” is the maximum number of categories in the column (in case of ‘Gender’, it is only 2 (Male, Female) which is definitely less than 1000)This is then passed into “tf.feature_column.embedding_column()”“dimension” parameter takes in the exact number of categories in the column (in ‘Gender’ it is 2) embedding column is only used for a dense neural network (simple linear model doesn’t require this step)categorical_object_feat_cols = [tf.feature_column.embedding_column( tf.feature_column.categorical_column_with_hash_bucket(key=col,hash_bucket_size=1000), dimension = len(deep_df[col].unique())) for col in categorical_columns if deep_df[col].dtype=='O']“tf.feature_column.categorical_column_with_identity()” is used for categorical columns with integer (or float) dtypecategorical_integer_feat_cols = [tf.feature_column.embedding_column( tf.feature_column.categorical_column_with_identity(key=col,num_buckets=2),dimension = len(deep_df[col].unique())) for col in categorical_columns if deep_df[col].dtype=='int64']use escape characters after each line if you are trying this out.. More details

Leave a Reply