23 great Pandas codes for Data Scientists

23 great Pandas codes for Data ScientistsGeorge SeifBlockedUnblockFollowFollowingAug 22, 2018Here are 23 Pandas codes for Data Scientists to help better understand your data!Basic Dataset Information(1) Read in a CSV datasetpd.


from_csv(“csv_file”) ORpd.

read_csv(“csv_file”)(2) Read in an Excel datasetpd.

read_excel("excel_file")(3) Write your data frame directly to csvComma separated and without the indicesdf.


csv", sep=",", index=False)(4) Basic dataset feature infodf.

info()(5) Basic dataset statisticsprint(df.

describe())(6) Print data frame in a tableprint(tabulate(print_table, headers=headers))where “print_table” is a list of lists and “headers” is a list of the string headers(7) List the column namesdf.

columnsBasic Data Handling(8) Drop missing datadf.

dropna(axis=0, how='any')Returns object with labels on given axis omitted where alternately any or all of the data are missing(9) Replace missing datadf.

replace(to_replace=None, value=None)replaces values given in “to_replace” with “value”.

(10) Check for NANspd.

isnull(object)Detect missing values (NaN in numeric arrays, None/NaN in object arrays)(11) Drop a featuredf.

drop('feature_variable_name', axis=1)axis is either 0 for rows, 1 for columns(12) Convert object type to floatpd.

to_numeric(df["feature_name"], errors='coerce')Convert object types to numeric to be able to perform computations (in case they are string)(13) Convert data frame to numpy arraydf.

as_matrix()(14) Get first “n” rows of a data framedf.

head(n)(15) Get data by feature namedf.

loc[feature_name]Operating on data frames(16) Apply a function to a data frameThis one will multiple all values in the “height” column of the data frame by 2df["height"].

apply(lambda height: 2 * height)ORdef multiply(x): return x * 2df["height"].

apply(multiply)(17) Renaming a columnHere we will rename the 3rd column of the data frame to be called “size”df.

rename(columns = {df.

columns[2]:'size'}, inplace=True)(18) Get the unique entries of a columnHere we will get the unique entries of the column “name”df["name"].

unique()(19) Accessing sub-data framesHere we’ll grab a selection of the columns, “name” and “size” from the data framenew_df = df[["name", "size"]](20) Summary information about your data# Sum of values in a data framedf.

sum()# Lowest value of a data framedf.

min()# Highest valuedf.

max()# Index of the lowest valuedf.

idxmin()# Index of the highest valuedf.

idxmax()# Statistical summary of the data frame, with quartiles, median, etc.


describe()# Average valuesdf.

mean()# Median valuesdf.

median()# Correlation between columnsdf.

corr()# To get these values for only one column, just select it like this#df["size"].

median()(21) Sorting your datadf.

sort_values(ascending = False)(22) Boolean indexingHere we’ll filter our data column named “size” to show only values equal to 5df[df["size"] == 5](23) Selecting valuesLet’s select the first row of the “size” columndf.

loc([0], ['size'])Like to read about tech?Follow me on twitter where I post all about the latest and greatest tech!.. More details

Leave a Reply