Say I have two pandas dataframes in a format similar to this one where each 'Point' represents a person and where they were at. Assume that there are multiple line strings representing different people walking in a specific path. I want to compare based on foot traffic between the two days, whether there is 'similarity' (e.g., based on deviation parameter that could be set) between those of one day vs another one.
Name, Latitude, Longitude, Date Jordan,<lat>,<lon>, 2017-08-01T00:00:05 Jordan,<lat>,<lon>, 2017-08-01T00:00:08 Jordan,<lat>,<lon>, 2017-08-01T00:00:10 Jordan,<lat>,<lon>, 2017-08-01T00:00:16 Sarah,<lat>,<lon>, 2017-08-01T00:00:20 Sarah,<lat>,<lon>, 2017-08-01T00:00:30 Jordan,<lat>,<lon>, 2017-08-01T00:00:32
I use shapely to construct paths/lines that represent where each person was at on a given day and time.
How I generate the lines...
dayonegeom = [Point(ab) for ab in zip(dayonedataframe.longitude, daytwodataframe.latitude)] dayonegeodataframe = GeoDataFrame(dayonedataframe, geometry=dayonegeom) daytwogeodataframe = GeoDataFrame(dayonetwoframe, geometry=daytwogeom)
What is the best way for me to filter the dataframe or GeoDataFrame such that only the paths that are the most 'similar' to each other are kept, while eliminating the ones that are not?
Looking for the best way to do this, be it in pandas before the data gets converted to a geodataframe, or geopandas after it's been converted.