Veterans Day Reflections: Open source software and evacuation operations, a remarkable combination.
Quickly and safely aggregate geolocation data for location density analysis using a hexagonal grid system.
Ever since leaving the military, I’ve been fortunate to work with and run into many other veterans in the tech and startup industry, to include founding companies with them, including Gretel. This Veterans Day, I’d like to reflect on a specific set of events where I had the opportunity to work with a lot of other veterans and benefit from the booming set of tools being built by the open source software community.
In mid-to-late August of 2021, the United States Air Force led the largest Noncombatant Evacuation Operation (NEO) in history. Approximately 124,000 people were evacuated by airlift, from Afghanistan, in just over a two-week period. I spent over 12 years in the Air Force, and despite the confusion and chaos during this entire process,
I felt a sense of pride knowing the Air Force was leading this effort. I was not an aviator during my time in the Air Force, but I sure as hell took a lot of rides in C-17s and C-130s and I think to this day I’d almost rather fly via the USAF than some other airlines out there. Your bags fly free, actually show up, and the loadmasters don’t tolerate unruly passengers.
My wife, a US Army Veteran and highly proficient linguist, and I were glued to the television switching between channels airing updates about the evacuation. Combined, we spent over 24 months in Afghanistan. Watching this unfold triggered a lot of memories, storytelling, but worse of all, a sense of helplessness.
Then, all of the sudden, the messages started flowing in like crazy. Our Signal app notifications were blowing up as we quickly learned that various groups of veterans and NGOs were organizing teams to help put together information and details to assist people that were most likely not going to get out via airlift.
The mission was to put together as much open-source intelligence as possible for potential ground exfiltration routes and provide it to the various group leads so they could pass it along to the people still on the ground. For the next 72 hours or so we converted our dining room and kitchen into a mini operations center and dove in.
We called our various efforts Operation GTFO.
Requests were pouring in: scour any and all publicly available information to identify Taliban checkpoints along roads, utilize procurable satellite imagery to look for potential evacuation routes, analyze a myriad of location-based information to find densities of people and traffic to avoid, with the goal of finding safe evacuation routes that have a lower likelihood of detection from threats.
At this point, I did what any normal startup co-founder would do, rope my other co-founder into the madness! I pinged Alex, a veteran of the Intelligence Community and several deployments, gave him about a 30-second overview and that’s all it took. Then he told one of his close friends, a former Army Intelligence Officer what was up and he dove right in as well.
As these events unfolded, we constantly remarked how when we all served, being able to do this type of work would have been impossible to the everyday technologist. Now you can solve the hardest problems in a matter of days thanks to the open source community.
Alex and I were spending most of our time trying to make sense of various location-based data; what we really needed to do was aggregate location data into various densities so we could overlay the activity onto our main Common Operational Picture (COP), which was basically a shared Google Map with all sorts of layers enabled. One other group we were talking to suggested using Uber’s Hexagonal Hierarchical Spatial Index to do it and another group suggested using Folium for visualization.
Once we were able to analyze some of the location data we ported the GPS locations back into our Google Map. Finally, we were able to pull updated satellite imagery to better understand potential border crossing threats and overlay cell phone coverage with our other location data to better identify “go” and “no go” areas for ground teams. Everything was added to our map and then we could link to specific slide decks with more detailed analysis.
Uber obviously built H3 for a different purpose, as no one was really taking an Uber out of Afghanistan, but their decision to open source this technology was a huge enabler for us. H3 also serves as a pretty neat way to do privacy-preserving transforms on location data, while keeping it useful, win-win. There’s tons of other use cases for this kind of location analysis so I packaged up some of the jumbled code we had into a little module that would automate some of what we were doing. You can check out the code here and we’ll do a little tour of the features below.
First, follow the Getting Started section in the GitHub repo. The module expects a Pandas DataFrame with at least three types of columns present:
- A general “identifier.” This could be something like a customer ID, device ID, etc. This is the field that we’ll do unique counting of for density analysis.
For this walkthrough, we’ll use some rental scooter data that is public. There’s already a sample dataset in the repo, there’s also some sample code in the utils.py module in case you would like to create your own dataset.
Here’s a snippet of the dataset we are using:
The identifier column is `bike_id` which identifies a unique bike or scooter at any given point in time.
First, we’ll import and create a `DensityTransform` instance.
When creating the instance, you’ll want to provide four parameters:
- The source DataFrame
- The column name that is your “identifier”
- The latitude and longitude column names, which default to lat and lng
Next, we’ll fit the model. Here you have to choose the H3 resolution that is desired. This is a value between 0 and 15, inclusive. The higher the number, the smaller an area each hexagon covers. Once the model is fit, you can inspect the details of your hex resolutions using the resolution property.
In this case, each hexagon on the map covers approximately 36 square kilometers.
Now, we can transform our original data into location densities using the resolution specified. When doing this, you can optionally transform the name of the “identifier” field as well.
This will create a new DataFrame that looks like this:
In this resulting DataFrame, the `lat` and `lon` columns are now the centers of each hexagon, and for each hexagon we are given the unique count of Bike IDs.
Finally, we have another transform function that will run the same computation but display a map that shows the hexagons, color-coded by their various densities:
This will group the unique counts into 8 ranges of unique `bike_id` counts and plot a map:
Finally, if we wanted to make the resolution a little higher, we could re-run this whole process. Here we’ll just use method chaining to do everything in one command:
Here we see that our hexagons are a lot smaller and more specific:
Zooming in a bit, we can see the varying densities more closely:
There are a lot of improvements and feature additions that could be made as next steps, for instance:
- Filtering out hexagons with under a certain density, to preserve privacy.
- Instead of using exact unique identifier counts, you could also implement bucketing
- Utilize custom ranges for the densities, in this example each color represents a uniform number of unique Bike IDs, we could automate these ranges to account for non-linear ranges of unique counts.
It is remarkable how many open source tools and capabilities have been developed in the last several years. This kind of open and free technology was nascent if not absent a decade ago and we are fortunate for these tools that allowed veterans and other supporters to band together years after hanging up our uniforms for good. This evacuation effort continues today, several displaced individuals still need help, and this probably will not be the last time we harness open technology for these purposes.