How do I check for geospatial similarity in python? Is geopandas better or pandas for this?

by Rolando   Last Updated August 14, 2019 02:22 AM - source

Say I have two pandas dataframes in a format similar to this one where each 'Point' represents a person and where they were at. Assume that there are multiple line strings representing different people walking in a specific path. I want to compare based on foot traffic between the two days, whether there is 'similarity' (e.g., based on deviation parameter that could be set) between those of one day vs another one.

Name, Latitude, Longitude, Date
Jordan,<lat>,<lon>, 2017-08-01T00:00:05
Jordan,<lat>,<lon>, 2017-08-01T00:00:08
Jordan,<lat>,<lon>, 2017-08-01T00:00:10
Jordan,<lat>,<lon>, 2017-08-01T00:00:16 
Sarah,<lat>,<lon>, 2017-08-01T00:00:20
Sarah,<lat>,<lon>, 2017-08-01T00:00:30
Jordan,<lat>,<lon>, 2017-08-01T00:00:32

I use shapely to construct paths/lines that represent where each person was at on a given day and time.

How I generate the lines...

dayonegeom = [Point(ab) for ab in zip(dayonedataframe.longitude, daytwodataframe.latitude)]

dayonegeodataframe = GeoDataFrame(dayonedataframe, geometry=dayonegeom)
daytwogeodataframe = GeoDataFrame(dayonetwoframe, geometry=daytwogeom)

What is the best way for me to filter the dataframe or GeoDataFrame such that only the paths that are the most 'similar' to each other are kept, while eliminating the ones that are not?

Looking for the best way to do this, be it in pandas before the data gets converted to a geodataframe, or geopandas after it's been converted.

Related Questions

GeoPandas GeoDataFrame plot statistics - how?

Updated February 29, 2016 01:09 AM

Reading data to geopandas using WFS - GML format

Updated November 14, 2018 09:22 AM

Preserve Column Order of Geopandas file read

Updated January 21, 2018 01:22 AM

multiple shapefiles to one geopandas geodataframe

Updated April 18, 2019 22:22 PM