Geopy: Getting Geo Localization From Addresses

Geopy: Getting Geo Localization From Addresses

Image from Author

Motivation

I started to work with geo localization coordinates as a project request in my current work. That request consisted of getting the coordinates (latitude and longitude) from every single element in a datalake which consists of nearly 500,000, then getting the address of each of them from another database in order to get the addresses of each record.

At that point was precisely where the fun started and it was when I met Geopy, which is defined in its official documentation as:

…a Python client for several popular geocoding web services…makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.

It was certainly more efficient than looking one-by-one at each of the cities in Google Maps. So, let’s start getting our hands dirty with geographical data!

Locating coordinates of cities and countries

Geocoding of Geopy is provided by a number of different services which are not affiliated by any means to Geopy. These services provide APIs, which anyone could implement, and Geopy is just a library which provides these implementations for many different services in a single package.

Image from geopy

So, Geopy is just a bridge between these services and the users of the Python library. The following is the long list of services that it uses in order to extract the geo data, from which we will use Nominatim for our analysis:

Image form geopy

Each of the services you might decide to use has its own class in geopy.geocoders abstracting the service’s API. Geocoders each define at least a geocode method for resolving a location from a string, and may define a reverse method, which resolves a pair of coordinates to an address. Each Geocoder accepts any credentials or settings needed to interact with its service, e.g., an API key or locale, during its initialization.

I have made a Notebook available on my GitHub that you can use in order to follow along with this useful library.

The first thing to do is import the libraries that we will use:

Image from Author

If you have already cloned the repo in your local drive, you will notice that I dropped a csv inside the data folder of the project called cities.csv — it contains 50 cities from different parts of the world:

Image from Author

Then I read that file with pandas and convert it to a list which will iterate on it in a for loop using the Nominatim geolocator:

Image from Author

As described in its documentation (which by the way is pretty comprehensive) we need to set a user_agent as a parameter for doing the API calls. Using the default one is not very recommended:

Image from geopy

This is why in the Notebook I have set my own, different from the default (you can leave this or rename it as you want). After that I created some empty lists that will store the data extracted and this loop will extract from the geolocator the location, address, latitude, and longitude of each of the 50 cities in the list of cities, then it will append them into the empty lists.

Image from Author

After that I joined all the lists that have the same length in a pandas Dataframe. You will notice that the geocoder has extracted some data from the source in its original language, which is the default behavior of geolocator, (find more information about the parameters to set the result of the calls in Nominatim documentation) so I have left the list that I used as the source as the first column of the dataframe:

Image from Author

Isolating difficult data samples? Comet can do that. Learn more with our PetCam scenario and discover Comet Artifacts.

Identifying cities that have the same name

But what would happen if we would need to look for coordinates of cities that have the same name and are located in the same country, different states, or even in different countries?

Image from Google

Image from Google

Then we would have to identify inside our csv file the corresponding country and state where the city is located, so the geolocator identifies the cities properly:

Image from Author

The result of passing the above list to the locate_cities function that is defined in the notebook is as follows:

Image from Author

You’ll notice that for each pair of latitude and longitude, the longitude coordinates located in America are negative, and the ones located in Europe (which is the historic prime meridian or Greenwich meridian, the geographical reference line that passes through the Royal Observatory, Greenwich) at the west of Greenwich UK are positive, that’s an easy way to validate if the results of the extraction from geo service are correct according to the locations we passed to the geolocator.

Another way is www.findlatitudeandlongitude.com, but we would need to pass the geographical points one-by-one, which won’t be very efficient if we have thousands of points.

Image from https://www.findlatitudeandlongitude.com/

Visualizing geographical data

Another way to validate or even visualize the geographical data we extracted is by reading the csv files in Tableau Public. In the linked article I showed a little bit of that:

[Subway Data ETL Pipeline: Part II
A brief tutorial on how to extract, transform, and load data from wikipedia with webscraping and pandasheartbeat.comet.ml](https://heartbeat.comet.ml/subways-data-etl-pipeline-part-ii-bbaff17b4ee4 "heartbeat.comet.ml/subways-data-etl-pipelin..")

In order to visualize the geo data you can upload one of the csv files to Tableau Public, like this:

Image from Author

Once you have connected your file and Tableau reads it, you will need to go to the 1st Sheet of your workbook, then go to Analysis >>(untick) Aggregate Measures. This will show all your points in a very customizable map when you previously would need to drag and drop your dimensions Longitude and Latitude into the Columns and Rows respectively.

Image from Author

All the coordinates shown in your Tableau map will correspond to every point in the csv you imported, very easy, isn’t it!

Conclusion

We have learned how to use this useful python library called Geopy, for research purposes or even to integrate it with our projects, and also a little bit of geocoding, so I hope it has been useful for you, remember you can clone this repo from my GitHub (medium_notebooks repo):

[fvgm-spec - Repositories
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…github.com](https://github.com/fvgm-spec?tab=repositories "github.com/fvgm-spec?tab=repositories")

Without further ado, happy coding and visualization!!

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.