It won’t filter out rows or columns but will show NA for the cells that don’t meet the requirements:If we filter with a variation of a columns’s value:What’s a lambda function?Lambda functions can be used wherever function objects are required.
It’s anonymous, but you can assign a variable to it, for example:you can set f = lambda x: max(x)- min(x).
Here we filter the regions when SizeRank is an even number.
Use lambda to apply a rule on more than one column:Examples on filter both columns and rowsIt gives an error if we run raw_df[raw_df.
loc>450000] because there are non-numeric columns like state or city.
Using what we learned from my last article, we select numerical columns only.
If we want to select the data ranked top 5 in size, and only keep the months when the rent is greater than 450,000 for the first row [index==0]Now we get back to use raw_df with all the columns, and select the data ranked top 5 in size, and only keep the string columns this time.
For this type of filtering to work, the 2 elements inside the  have to each yield a series of Boolean results (true, false) on their own.
Otherwise it won’t work.
0]will fail, because num_df.
0 doesn’t give a series of Booleans, it’s an array of Booleans.
Format like df.
loc[‘index’]>0] will work because it only deals with one row and one column, so it’s selecting by 2 series of booleans.
Be careful of the syntax!It gives an error because this format will assume it’s rows but the command is actually selecting columns.
locneeds a : on the left side, if the condition is about columns.
If the condition is about rows, you can ignore the : on the right side.
Assign, map and transform data to the ideal scale2.
Assign ValuesUse .
copy() if you want to copy the data for some transformation while still keeping the original data untouched.
We are going to use this copied dataframe to practice assigning values.
Assign values to rows use .
loc or .
ilocAssign values to columnsCreate a new column by assigning values by conditionCreate a new column by using existing columns: Map or ApplyMap: if too mange columns need to change values through creating a dictionary2.
2 Map: it iterates over each element of a series, but only one series.
We can use map to change values in one column.
For example: when we index a column like this: raw_df[‘2018–04’], it is a series; so we can use map to change the value’s unit in 2018–04 to ‘thousand’ by multiplying 0.
001 in this series:If we want to change more than one column to thousands, use applymap.
3 ApplyMap: This helps to apply a function to each ELEMENT of the dataframe.
4 Apply: use if we need to apply for one or more columns more specifically.
As the name suggests, it applies a function ALONG any axis of the DataFrame.
Review: what’s the difference between map, appymap and apply?map: operation on every element in one series, or one column of a dfapplymap: every element in a df (same operation for elements in all the columns and rows)apply: an operation that takes multiple columns from a dfSpecial form of apply:df[['col1','col2']].
apply(sum) : it will return the sum of all the values of column1 and column2.
Special form of apply in pandas to get aggregated value:Or use agg to get more types of descriptive statistics:2.
4 Use apply to rescale data for machine learning:Normalize and Standardize data in Python (you can use standard scaler from sklearn, but this is the concept).
That’s it for the second part of my series on building muscle memory for data science in Python.
The first part is linked at the end.
Stay tuned! My next tutorial will show you how to ‘curl the data science muscles’ for joining and pivoting data.
Follow me and give me a few claps if you find this helpful :)No spam, I’ll be mindful of your time and attentionYou might also be interested in my analysis on rental seasonality:How to Analyze Rental Seasonality and Trend to Save Money on Your LeaseWhen I was looking for a new apartment to rent, I started to wonder: is there any seasonality impact?.Is there a month…medium.