The Battle of Neighborhoods in Italian Dishes!

An IBM Coursera Applied Data Science Capstone Project.

Sebastian Carmona A.
9 min readNov 30, 2020

--

Toronto is the capital city of the Canadian province of Ontario. With a recorded population of 2,731,571 in 2016, it is the most populous city in Canada and the fourth most populous city in North America. The city is the anchor of the Golden Horseshoe, an urban agglomeration of 9,245,438 people (as of 2016) surrounding the western end of Lake Ontario, while the Greater Toronto Area (GTA) proper had a 2016 population of 6,417,516. Toronto is an international center of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world. According to the United Nations Development Program, Toronto has the second-highest percentage of constant foreign-born population among world cities, after Miami, Florida.

According to Thrillist, Food is a big deal in Toronto. Not only do we need it to live, but they’ve come into a foodie destination. New restaurants are constantly opening, and chefs continue to push culinary boundaries to come up with new and innovative ideas. It’s a bit dizzying just how MUCH good stuff there is to eat here. But don’t be daunted.

In this project I analyzed all the Italian Restaurants currently present in Toronto, and checked the top Italian Restaurants to give a full analysis.

Business Problem

This project checked all the Italian Restaurants that are in Toronto. It analyzed the data of Toronto’s postal codes and saw where are the greatest number of Italian Restaurants. the major purpose of this project, was to suggest and visualize what does the best Italian restaurants have and where are they located for future Italian restaurants.

Target Audience

The people who will benefit for this project are future chefs and business persons who will like to set up a new Italian restaurant in Toronto, knowing where will be the best point of location in Toronto, where is the best place to set up this new restaurant. Also, to know the highest competitors and what does people expect and like from them.

Logo found at https://es.foursquare.com/

This project used the Four-Square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about the restaurants that I needed. Using Foursquare API credentials, I mined the different Italian restaurants around Toronto, and I searched the top 5, and saw the different attributes of each one of them.

Libraries Used to Developed the Project

  • Pandas: For creating and manipulating data frames.
  • Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
  • Scikit-Learn: For importing k-means clustering.
  • JSON: Library to handle JSON files. •
  • XML: To separate data from presentation and XML stores data in plain text format.
  • Geocoder: To retrieve Location Data.
  • Matplotlib: Python Plotting Module.

Data Description

Toronto Postal Codes data Link:

This is a list of postal codes in Canada where the first letter is M. Postal codes beginning with M are located within the city of Toronto in the province of Ontario. Only the first three characters are listed, corresponding to the Forward Sortation Area. This table consist of “Postal Code”, “Borough” and “Neighborhood”

Latitude and Longitude:

https://cocl.us/Geospatial_data

I wrangled and cleaned the data, leaving them useful for the project, erasing any “Not Assigned” in the Borough column, also if there is any “Not Assigned” value in the Neighborhood column I copied the same of the Borough column. Also, if there is a same postal code and borough for different neighborhoods, I grouped them in to a single row.

Foursquare API the project used the Foursquare location data, with the Data Frame cleaned. Venues of all Italian Restaurants in Toronto city and their rating, likes and tips information data. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

Folium Package visualize the data in form of Boundary Map. Folium package (Folium is a python library that can create interactive leaflet map using coordinate data.)

Methodology

The project started by importing and installing all necessary libraries for the project in this there was Pandas, Numpy, requests, geopy, matplotlib, folium, yellowbrick and sklearn.

The Postal Code table from wikipedia was extracted as a Pandas Dataframe

Later, a “.csv” (Comma Separated Values) file was imported with the Latitude and Longitude data for each postal code, and was merged with the first dataframe.

After this, the principal dataframe that was used for the whole project was cleaned and useful. Living it with with 5 columns (Postal Code, Borough, Neighborhood, Latitude and Longitude) and with 103 instances.

Exploratory Data Analysis

The data analysis began by exploring the number of neighborhoods that each borough had.

North York Borough has the highest number of Neighborhoods in Toronto City, mostly 25 number of neighborhoods. Later, an analysis by a bar plot with the Foursquare API was made to discover the number of Italian restaurants in each borough in Toronto City.

Central Toronto and Downtown Toronto have the greatest number of Italian Restaurants in the city according of the bar plot above. So, we will see if there is any difference in the number of Restaurants. For better understanding of the amount of restaurants per Borough a table was created.

It can be analyzed that Central Toronto Borough has 34 Italian Restaurants as DownTown Toronto, followed by West Toronto which has 14 Italian Restaurants, as for North York, the Borough with the greatest number of neighborhoods, has only 8 Italian Restaurants.

Later the top 5 neighborhoods having the greatest number of Italian restaurants was checked.

Analyzing the chart and the Top 5 table. The Davisville neighborhood is the first with 8 restaurants, followed by Davisville North with 6 restaurants. This falls in Central Toronto Borough, which has the maximum number of Italian restaurants of the city.

Furthermore, using the Foursquare API a 116 instances list was created with each Italian Restaurant with its Borough, Neighborhood, name, likes, rating and tips.

Later, with the Foursquare API was checked that Terroni got the Max Likes and Tips, and that its rating isn’t far from Noce rating. Also, it belongs to Downtown Toronto Borough and Garden District, Ryerson Neighborhood. For this we visualized the neighborhoods with the maximum average ratings of Italian Restaurants.

Commerce Court, Victoria Hotel Neighborhood has the highest average rating Italian restaurants in Toronto City with 8.6 rating.

Top 10 Italian Restaurants in Toronto City with its average rating, name, neighborhood and borough.

By analyzing the table Terroni has 2 restaurants in the same Borough, being those two with the highest Rating in their neighborhoods. Also, Mangia and Bevi Resto-Bar, and Fusaro’s hast 2 restaurants both in Downtown Toronto, also in 2 different Neighborhoods with the same rating.

There are 7 restaurants because there are more than one of the same in the Top 10 table.

Clustering

In the clustering part of the project, K-means was used as the clustering method, using the Elbow point to get the best K number of clusters.

Importing the KElbowVisualizer from the Yellowbrick.cluster library it was possible to visualized that the perfect number for K clusters was 5.

The clusters were labeled with neighborhoods according of the amount of the italian restaurants.

Making the 5 different clusters of Toronto city, was avail to analyzed that the cluster 2 has the greatest number of Italian restaurants followed by cluster 5 and 4.

Results

After analyzing the data, different results were gathered as follows:

  • Central Toronto and Downtown Toronto have the greatest number of Italian Restaurants in the city according of the bar plot made.
  • Central Toronto Borough has 34 Italian Restaurants as Downtown Toronto, Followed by West Toronto which has 14 Italian Restaurants, as for North York, the Borough with the greatest number of neighborhoods, has only 8 Italian Restaurants.
  • The Davisville neighborhood has the greatest number of restaurants with 8 restaurants, followed by Davisville North with 6 restaurants
  • Terroni got the Max Likes and Tips, and that its rating isn’t far from Noce rating. Also, it belongs to Dowtown Toronto Borough and Garden District, Ryerson Neighborhood.
  • Commerce Court, Victoria Hotel Neighborhood has the two highest average rating Italian restaurants in Toronto City with 8.6 rating.
  • Terroni has 2 restaurants in the same Borough, being those two with the highest ratings in their respected neighborhoods. Also, Mangia and Bevi Resto-Bar and Fusaro’s hast 2 restaurants both in Downtown Toronto, also in 2 different Neighborhoods with the same rating.
  • Cluster 2 hast most of the Italian restaurants in the city, followed by Cluster 5 and Cluster 4.

Discussion

According to the analysis it concludes that to taste the best Italian Food and visit the best Italian Restaurant in the Toronto city we have to visit Terroni as currently it has the highest Average Rating of 8.6 in Commerce Court, Victoria Hotel Neighborhood in Downtown Toronto. Also, for Business purposes it is recommended to start a business of Italian food in York Borough which has the least number of Italian Restaurants and the demand will be high as expected, but if you want to start a Italian Restaurant in a middle demand area it can be at the East Toronto Borough or East York Borough. But, for better engagement with client you could set up a restaurant near Little Italy, that is near Christie Neighborhood in Downtown Toronto.

Now the competition here will be in between Central Toronto and Downtown Toronto Borough as these are the top two areas where Italian restaurants are found.

Conclusion

Finally, this is a small glimpse of how real-life data-science projects looks like. In this project I have imported different types of python libraries such as panda, numpy, matplotlib, etc. I have also used scikit-learn for cluster modeling. Used Foursquare api to get the latitude and longitude data of Toronto City by Geopy Client. I have explored the different Borough and Neighborhoods of Toronto city and analyze the data to get different outcomes for Italian Restaurants of different parts of the city. This project gave me the knowledge and strengths to encourage me with no fear of involving me in future Data Science projects as a junior data scientist. I had the opportunity to engage in data wrangling, data cleansing, data analytics, data graphics and merge this with awesome results and conclusions that we could continue furthermore.

Nevertheless,

I AM STILL LEARNING A LOT!

--

--

Sebastian Carmona A.

I like coffee and Data, Data and Coffee, simple! OH! And Python, and SQL! www.sebascarmona.com