Housing Sales Prices & Venues Data Analysis of IstanbulSercan YıldızBlockedUnblockFollowFollowingJan 24City of IstanbulA.
Description & Disscusion of the BackgroundIstanbul is one of the largest metropolises in the world where over 15 million people live and it has a population density of 2.
813 people per square kilometer.
As a resident of this city, I decided to use Istanbul in my project.
The city is divided into 39 districts in total.
However, the fact that the districts are squeezed into an area of approximately 72 square kilometers causes the city to have a very intertwined and mixed structure .
As you can see from the figures, Istanbul is a city with a high population and population density.
Being such a crowded city leads the owners of shops and social sharing places in the city where the population is dense.
When we think of it by the investor, we expect from them to prefer the districts where there is a lower real estate cost and the type of business they want to install is less intense.
If we think of the city residents, they may want to choose the regions where real estate values are lower, too.
At the same time, they may want to choose the district according to the social places density.
However, it is difficult to obtain information that will guide investors in this direction, nowadays.
When we consider all these problems, we can create a map and information chart where the real estate index is placed on Istanbul and each district is clustered according to the venue density.
Data DescriptionTo consider the problem we can list the datas as below:I found the Second-level Administrative Divisions of the Turkey from Spatial Data Repository of NYU .
json file has coordinates of the all city of Turkey.
I cleaned the data and reduced it to city of Istanbul where I used it to create choropleth map of Housing Sales Price Index of Istanbul.
I used Forsquare API to get the most common venues of given Borough of Istanbul .
There are not too many public datas related to demographic and social parameters for the city of Istanbul.
Therefor you must set-up your own data tables in most cases.
In this case, I collected latest per square meter Housing Sales Price Averages for each Borough of Istanbul from housing retail web page .
I used Google Map, ‘Search Nearby’ option to get the center coordinates of the each Borough.
MethodologyAs a database, I used GitHub repository in my study.
My master data which has the main components Borough, Average House Price, Latitude and Longitude informations of the city.
Master DataI used python folium library to visualize geographic details of Istanbul and its boroughs and I created a map of Istanbul with boroughs superimposed on top.
I used latitude and longitude values to get the visual as below:Borough of IstanbulI utilized the Foursquare API to explore the boroughs and segment them.
I designed the limit as 100 venue and the radius 750 meter for each borough from their given latitude and longitude informations.
Here is a head of the list Venues name, category, latitude and longitude informations from Forsquare API.
List of VenuesIn summary of this data 43 venues were returned by Foursquare.
Here is a merged table of boroughs and venues.
Table of Boroughs and VenuesWe can see that Kadikoy, Maltepe, Beyoglu, Besiktas, Sisli and Fatih how reached the 100 limit of venues.
On the other hand; Pendik, Arnavutkoy, Tuzla, Adalar, Buyukcekmece, Sultangazi, Cekmekoy, Beylikduzu, Sultangazi boroughs are below 20 venues in our given coordinates with Latitude and Longitude, in below graph.
The result doesn’t mean that inquiry run all the possible results in boroughs.
Actually, it depends on given Latitude and Longitude informations and here is we just run single Latitude and Longitude pair for each borough.
We can increase the possibilities with Neighborhood informations with more Latitude and Longitude informations.
Number of venues for each boroughIn summary of this graph 256 unique categories were returned by Foursquare, then I created a table which shows list of top 10 venue category for each borough in below table.
List of top 10 venue categoryWe have some common venue categories for borough.
In this reason I used unsupervised learning K-means algorithm to cluster the boroughs.
K-Means algorithm is one of the most common cluster method of unsupervised learning.
First, I will run K-Means to cluster the boroughs into 3 clusters because when I analyze the K-Means with elbow method it ensured me the 3 degree for optimum k of the K-Means.
Elbow method to specify the best k valueHere is my merged table with cluster labels for each borough.
Merged Table with clustersWe can also estimate the number of 1st Most Common Venue in each cluster.
Thus, we can create a bar chart which may help us to find proper label names for each cluster.
Number of venues in each clusterWhen we examine above graph we can label each cluster as follows:Cluster 0 : “Cafe Venues”Cluster 1 : “Multiple Social Venues”Cluster 2 : “Accommodation & Intensive Cafe Venues”We can also examine that what is the frequency of average housing sales prices in different ranges.
Thus, histogram can help to visualization:Average Housing Sales Prices in RangesAs it seems in above histogram, we can define the ranges as below:4000 AHP : “Low Level HSP”4000–6000 AHP : “Mid-1 Level HSP”6000–8000 AHP : “Mid-2 Level HSP”8000–10000 AHP : “High-1 Level HSP”> 10000 AHP : “High-2 Level HSP”One of my aim was also show the number of top 3 venues information for each borough on the map.
Thus, I grouped each borough by the number of top 3 venues and I combined those informations in Join column.
Number of top 3 venues for each boroughC.
ResultsLet’s merge those new variables with related cluster informations in our main master table.
Master tableYou can now see Join, Labels and Level_labels columns as the last three ones in above table.
You can also see a clustered map boroughs of IstanbulClustered map boroughs of IstanbulIn summary section, one of my aim was also visualize the Average Housing Sale Prices for per square meter with choropleth style map.
Thus, first I downloaded a json file of Second-level Administrative Divisions of the Turkey from Spatial Data Repository of NYU .
I cleaned the json file and pull out only city of Istanbul.
In final section, I created choropleth map which also has the below informations for each borough:Borough name,Cluster name,Housing Sales Price (HSP) Levels,Top 3 number of venueChoropleth map of Istanbul with final datasD.
DiscussionAs I mentioned before, Istanbul is a big city with a high population density in a narrow area.
The total number of measurements and population densities of the 39 districts in total can vary.
As there is such a complexity, very different approaches can be tried in clustering and classification studies.
Moreover, it is obvious that not every classification method can yield the same high quality results for this metropol.
I used the Kmeans algorithm as part of this clustering study.
When I tested the Elbow method, I set the optimum k value to 3.
However, only 39 district coordinates were used.
For more detailed and accurate guidance, the data set can be expanded and the details of the neighborhood or street can also be drilled.
I also performed data analysis through this information by adding the coordinates of districts and home sales price averages as static data on GitHub.
In future studies, these data can also be accessed dynamically from specific platforms or packages.
I ended the study by visualizing the data and clustering information on the Istanbul map.
In future studies, web or telephone applications can be carried out to direct investors.
ConclusionAs a result, people are turning to big cities to start a business or work.
For this reason, people can achieve better outcomes through their access to the platforms where such information is provided.
Not only for investors but also city managers can manage the city more regularly by using similar data analysis types or platforms.
References: Istanbul — Wikipedia Second-level Administrative Divisions of the Turkey Forsquare API Housing Sales Prices of Each Borough from “Hurriyet Retail Index for 2018” Google Map.