All posts by Amy Smith

Eric Fischer: “There may yet be an objective measure of the goodness of places, but I haven’t found it yet“

Eric Fischer
Eric Fischer
Eric Fischer works on data visualization and analysis tools at Mapbox. He was previously an artist in residence at the Exploratorium and before that was on the Android team at Google. He is best known for "big data" projects using geotagged photos and tweets, but has also spent a lot of time in libraries over the years searching through old plans and reports trying to understand how the world got to be the way it is.

Q: You’re coming up on four years at Mapbox, is that right? What do you do there?

A: I still feel like I must be pretty new there, but it actually has been a long time, and the company has grown tremendously since I started. My most important work at Mapbox has been Tippecanoe, an open-source tool whose goal is to be able to ingest just about any kind of geographic data, from continents to parcels to individual GPS readings, numbering into the hundreds of millions of features, and to create appropriate vector tiles from them for visualization and analysis at any scale. (The name is a joke on “Tippecanoe and Tyler Too,” the 1840 US Presidential campaign song, because it makes tiles, so it’s a Tyler.)

Q: I read that you’re working on improving the accuracy of the OpenStreetMap base map. Can you describe that process? I’m guessing one would need to figure out how accurate it is in the first place?

A: I should probably update my bio, because that was originally a reference to a project from long ago: to figure out whether it would be possible to automatically apply all the changes that the US Census had made to their TIGER/Line base map of the United States since it was imported into OpenStreetMap in 2006, without overriding or creating conflicts with any of the millions of edits that had already been made directly to OpenStreetMap. Automated updates proved to be too ambitious, and the project was scaled back to identifying areas where TIGER and OpenStreetMap differed substantially so they could be reconciled manually.

But the work continues. These days, TIGER is valuable to OpenStreetMap mostly as a source of street names and political boundaries, while missing and misaligned streets are now identified mostly through anonymized GPS data. Tile-count is an open source tool that I wrote a few months ago for accumulating, normalizing, and visualizing the density of these GPS tracks so they can be used to find streets and trails that are missing from OpenStreetMap.

Q: In the professional mapping world, I’ve noticed there’s a nervousness around datasets that aren’t time-tested, clearly documented, and from an authoritative source such as the US Census. These official datasets are great resources of course, but there’s a growing amount of data at our fingertips that’s not always so clean or complete. You’ve been successful at getting others to see that there’s a lot to learn about cities and people with dynamic (and sometimes messy) data that comes from many different sources. Do you have any advice on warming people up to thinking creatively and constructively with unconventional datasets?

A: I think the key thing to be aware of is that all data has errors, just varying in type and degree. I don’t think you can spend very much time working with Census data from before 2010 without discovering that a lot of features on the TIGER base map were missing or don’t really exist or are tagged with the wrong name or mapped at the wrong location. TIGER is much better now, but a lot of cases still stand out where Census counts are assigned to the wrong block, either by mistake or for privacy reasons. The big difference isn’t that the Census is necessarily correct, but that it tries to be comprehensive and systematic. With other data sets whose compilers don’t or can’t make that effort, the accuracy might be better or it might be worse, but you have to figure out for yourself where the gaps and biases are and how much noise there is mixed in with the signal. If you learn something interesting from it, it’s worth putting in that extra effort.

Q: Speaking of unconventional data: you maintain a GitHub repository with traffic count data scraped from old planning documents. For those who may not be familiar, traffic counts are usually collected for specific studies or benchmarks, put into a model or summarized in a report… and then rarely revisited. But you’ve brought them back from the grave for many cities and put them in handy easy-to-use-and-access formats, such as these ones from San Francisco. Are you using them for a particular project? How do you anticipate/hope that others will use them?

A: The traffic count repository began as a way of working through my own anxieties about what unconventional datasets really represent. I could refer to clusters of geotagged photos as “interesting” and clusters of geotagged tweets as “popular” without being challenged, but the lack of rigor made it hard to draw any solid conclusions about these places.

And I wanted solid conclusions because I wasn’t making these maps in a vacuum for their own sake. I wanted to know what places were interesting and popular so that I could ask the follow-up questions: What do these places have in common? What are the necessary and sufficient characteristics of their surroundings? What existing regulations prevent, and what different regulations would encourage, making more places like them? What else would be sacrificed if we made these changes? Or is the concentration of all sparks of life into a handful of neighborhoods in a handful of metro areas the inevitable consequence of a 150-year-long cycle of adoption of transportation technology?

So it was a relief to discover Toronto’s traffic count data and that the tweet counts near intersections correlated reasonably well with the pedestrian counts. Instead of handwaving about “popularity” I could relate the tweet counts to a directly observable phenomenon.

And in fact the pedestrian counts seemed to be closer than tweet counts to what I was really looking for in the first place: an indicator of where people prefer to spend time and where they prefer to avoid. Tweets are reflective of this, but also capture lots of places where people are enduring long waits (airport terminals being the most blatant case) rather than choosing to be present. Not every pedestrian street crossing is by choice either, but even when people don’t control the origin and destination of their trips, they do generally have flexibility to choose the most pleasant route in between.

That was enough to get me fixated on the idea that high pedestrian volume was the key to everything and that I should find as many public sources of pedestrian counts as possible so I could understand what the numbers look like and where they come from. Ironically, a lot of these reports that I downloaded were collecting pedestrian counts so they could calculate Pedestrian Level of Service, which assumes that high crossing volumes are bad, because if volumes are very high, people are crowded. But the numbers are still valid even if the conclusions being drawn from them are the opposite.

What I got out of it was, first of all, basic numeracy about the typical magnitudes of pedestrian volumes in different contexts and over the course of each day. Second, I was able to make a model to predict pedestrian volumes from surrounding residential and employment density, convincing myself that proximity to retail and restaurants is almost solely responsible for the number, and that streetscape design and traffic engineering are secondary concerns. Third, I disproved my original premise, because the data showed me that there are places with very similar pedestrian volumes that I feel very differently about.

If “revealed preference” measured by people crossing the street doesn’t actually reveal my own preferences, what does? The ratio of pedestrians to vehicles is still a kind of revealed preference, of mode choice, but the best fit between that and my “stated preference” opinions, while better than pedestrian volume alone, requires an exponent of 1.5 on the vehicle count, which puts it back into the realm of modeling, not measuring. There may yet be an objective measure of the goodness of places, but I haven’t found it yet.

Why did I put the data on GitHub? Because of a general hope that if data is useful to me, it might also be useful to someone else. The National Bicycle and Pedestrian Documentation Project is supposedly collecting this same sort of data for general benefit, but as far as I can tell has not made any of it available. Portland State University has another pedestrian data collection project with no public data. Someday someone may come up with the perfect data portal and maybe even release some data into it, but in the meantime, pushing out CSVs gets the data that actually exists but has previously been scattered across hundreds of unrelated reports into a form that is accessible and usable.

Q: What tools do you use the most these days to work with spatial data (including any tools you’ve created — by the way, thanks for sharing your geotools on Github)?

A: My current processes are usually very Mapbox-centric: Turf.js or ad hoc scripts for data analysis, Tippecanoe for simplification and tiling, MBView for previewing, and Mapbox Studio for styling. Sometimes I still generate PostScript files instead of web maps. The tool from outside the Mapbox world that I use most frequently is ogr2ogr for reprojection and file format conversion. It is still a constant struggle to try to make myself use GeoJSON for everything instead of inventing new file formats all the time, and to use Node and standard packages instead of writing one-of-a-kind tools in Perl or C++.

Q: You’re prolific on Twitter. What do you like about it, and what do you wish was better?

A: I was an early enough adopter of Twitter to get a three-letter username, but it wasn’t until the start of 2011 that I started really using it. Now it is my main source of news and conversation about maps, data, housing policy, transportation planning, history, and the latest catastrophes of national politics, and a place to share discoveries and things to read. I’ve also used long reply-to-myself Twitter threads as a way of taking notes in public as I’ve read through the scientific literature on colorblindness and then a century of San Francisco Chronicle articles revealing the shifting power structures of city planning.

That said, the Twitter timeline interface has become increasingly unusable as they have pulled tweets out of sequence into “in case you missed it” sections and polluted the remainder of the feed with a barrage of tweets that other people marked as favorites. I recently gave up entirely on the timeline and started reading Twitter only through a list, the interface for which still keeps the old promise that it will show you exactly what you subscribed to, in order.

Q: If you could go back in time, what data would you collect, from when, and where?

A: I would love to have pedestrian (and animal) intersection crossing volume data from the days before cars took over. Was the median pedestrian trip length substantially longer then, or can the changes in pedestrian volumes since motorization all be attributed to changes in population and employment density?

Speaking of which, I wish comprehensive block-level or even tract-level population and employment data went back more than a few decades, and had been collected more frequently. So much of the story of 20th century suburbanization, urban and small-town decline, and reconsolidation can only be told through infrequent, coarse snapshots.

And I wish I had been carrying a GPS receiver around with me (or that it had even been possible to do so) for longer, so that I could understand my own historic travel patterns better. I faintly remember walking to school as a kid and wondering, if I don’t remember this walk, did it really happen? Now my perspective is, if there is no GPS track, did it really happen?

Q: Are you a geohipster? Why or why not?

A: I think the most hipster thing I’ve got going on is a conviction that I’m going to find a hidden gem in a pile of forgotten old songs, except that I’m doing my searching in promo copies of 70-year-old sheet music instead of in the used record stores.

Nate Smith: “Visit a new place in the world; reach out to the OSM communities there”

Nate Smith is technical project manager for the Humanitarian OpenStreetMap Team. He leads out the OpenAerialMap project and dives into all things technical across HOT’s operations. Originally from Nebraska, he is now based in Lisbon, Portugal, slowly learning Portuguese and attempting to learn to surf. 

Q: We met at State of the Map Asia in Manila! What was it that brought you to the conference?

A: I came to State of the Map Asia through my role in two projects with the Humanitarian OpenStreetMap Team: OpenAerialMap and a new project called Healthsites. I had the chance to give short presentations about the projects, plus I wanted to connect with the OpenStreetMap community in Asia about the projects to get feedback and input on the direction of the projects.

Q: Tell us about the Humanitarian OpenStreetMap Team (HOT) and how you got involved.

A: I’ve been involved in HOT in one way or another since 2011. At the time I had just joined Development Seed in Washington DC. I began to get involved in any way I could with HOT, most of it started with trainings about Mapbox tools or collaborating on projects. Most of it initially revolved around helping identify data that could be helpful in an activation or joining in tracing. Over the years, I gradually got more involved in working groups which is the best place to get involved beyond contributing time to mapping. I’ve since joined HOT as a technical project manager to help build and manage projects around some of our core tools like OpenAerialMap or OSM Analytics.

Q: For those who may not be familiar with HOT, “activation” is kind of like bringing people together to participate in disaster mapping or a similarly geographically-focused humanitarian mapping effort, did I get that right?

A: Right, a HOT activation in the traditional sense is exactly that. It is an official declaration that the community is coming together to aggressively map an area for a disaster response. The Activation Working Group is one of several working groups where anyone can get involved, and they define the protocols, monitor situations, and are in contact with many OSM communities and humanitarian partners around the world.

Disaster mapping is a core part of the work HOT does. Not everything but still a big part. If you’re interested in helping think about activation protocols or want to help organize during an activation, come join and volunteer your time to support the work.

Q: What are some interesting projects you’re working on?

A: I’ve been actively working on two interesting projects: OpenAerialMap, and for lack of a better name at the moment, the Field Campaigner app. OpenAerialMap launched two years ago and we’ve been slowly rolling out new features and working with partners on integrating new data since. What’s interesting is the work we’re doing this summer — we’re rolling out user accounts, provider pages, and better data management tools. This is exciting as it lowers the barrier to start collecting imagery and contributing to the commons.

The second project is our new Field Campaigner app. It has a generic name at the moment but it’s part of a move for us to have better tools to manage data collection in the field. A majority of the work the global HOT community does is remote mapping. While this is super critical work and extremely helpful for people on the ground, there is a gap in how work is organized on the ground. This work looks to help improve the way data collection is organized and coordinated on the ground — we want to see field mapping in OpenStreetMap to be distributed and organized well. This work also crosses over some similar work that is happening across the board in this area — Mapbox is working on analyzing changesets for vandalism and a team from Development Seed and Digital Democracy through a World Bank project are working on an improved mobile OSM data collection app.

Q: How easy/hard is it to build these tools? Once they’re out in the world, what are some ways that people find and learn how to use them?

A: It’s not easy building tools to meet a lot of needs. A core thing for success many times is dogfooding your own work. We’re building tools that serve a wider audience but at the core we’re testing and helping spread the word about the tool because we use it.

But just because it’s not easy doesn’t mean people shouldn’t be trying. The more we experiment building tools to do better and faster mapping, whether it is remote or in the field, the more information we will have to improve and address the challenges many communities face.

Q: It looks like your job is fairly technical, but also involves outreach. Is there a particular aspect of your work that you enjoy the most?

A: I think the mix of technical and outreach is what I love most. Spending part of my day diving into some code while the other part talking or strategizing with organizations is what I’ve had the chance to do over the last six years through working with Development Seed and now HOT. I enjoy trying to be that translation person — connecting tools or ways of using data to solve real-world problems. I think one of the things I enjoy the most is the chance to help build products or use data with real world impact. Being able to support MSF staff responding to an Ebola outbreak at the same time working with world-class designers and developers is pretty great.

Q: Looking at your Twitter feed, you seem to travel a lot. What’s your favorite / least favorite thing about traveling? Favorite place you’ve been? Any pro travel tips?

A: I traveled a bit while living in DC but now that I’m living in Lisbon, Portugal I’ve had the chance to do some more personal travel throughout Europe which has been great. This past year I’ve had a chance to travel through Asia a bit more through HOT-related projects. My favorite part of traveling is the chance to meet people and experience new cultures or places. There are some incredible geo and OSM communities around the world and it’s been awesome to meet and work with many of them. Least favorite — awkwardly long layovers – you can’t get out.

I think my favorite spots have been Bangkok and Jakarta. I find that I enjoy big cities that have great food options. As for tips, I would say pack light and do laundry when you’re traveling, and always make time for good local food.

Q: Would you consider yourself a geohipster? If so, why, and if not, why not?

A: Heh, that is a great question. I think I’ve become less geohipster moving to Portugal. I drink light European beer, I don’t bike because there are too many hills, and drink too much Nespresso. Although I’m still a Mapbox-junky, work at a cowork in my neighborhood, and love open source, so maybe I still lean geohipster. 🙂

Q: On closing, any words of wisdom for our global readership?

A: Get out and visit a new place in the world if you can. And while you’re at it, reach out to the OSM communities there and meet them in person. You’ll meet some incredible and passionate people.

Stephen Mather: “The best way to predict the future is to stake a claim in it and make it happen”

Stephen Mather
Stephen Mather

Stephen Mather has been working in GIS, planning, and related fields since 1998, working for the last 7 years as the GIS Manager for Cleveland Metroparks. He has been interested in the application of computer vision to geospatial analyses since 2004, and has recently initiated the OpenDroneMap project — a project to bring together and extend a suite of open source computer vision software for use with UAS (drone) and street level images. He is also coauthor of the PostGIS Cookbook.

Stephen was interviewed for GeoHipster by Amy Smith at the recent FOSS4GNA conference in San Francisco, California.

Q: How have you been enjoying the conference so far?

A: It’s been consistently good! There were sometimes two or three sessions that I wanted to be in at a time, so I had to figure out if I could clone myself.

Q: Clone yourself?

A: Yeah, well it would make it so much easier (well, probably the easier thing is to watch the video afterwards).

Q: Let me know if you figure out the cloning thing.

A: Oh, I’ll share it. It’ll be on Github.

Q: Awesome. Have you been to this conference before?

A: I went to variants on FOSS4G in DC, Denver, Portland, and Seoul.

Q: Wow, what was Seoul like?

A: That was FOSS4G Korea. It was awesome. The hospitality was amazing, the conference was really interesting. It’s a beautiful city, and it was lots of fun.

Q: Do you speak Korean?

A: Not adequately, no. (*laughs*). Not at all.

Q: You presented at this year’s conference. How did it go?

A: It was really fun. It was similar to a presentation I gave at North Carolina GIS a couple of weeks ago. The slides were already there, but it never ends up being the same presentation. OpenDroneMap is what I presented on, which started off as a GeoHipster joke at first, but then started to become a thing! People are excited about it, and are trying it out with their drones.

Amy and Steve
Amy and Steve at FOSS4GNA 2015

Q: Who started the joke?

A: Well, there was the GeoHipster artisanal vertices, and at the time I was thinking about computer vision and drones and where all that’s going, and the absence of an open source project that addresses that. When I made my prediction about 2014, I said it would be all about the artisanal pixel. We’d go from these global satellite images to these handcrafted satellite images effectively. Then I starting thinking, actually, that’s not a bad idea. The best way to predict the future is to stake a claim in it and make it happen.

Q: I definitely want to pick your brain about that later on in the interview. But before we get there, I wanted to ask you how you got started in the geospatial world.

A: I came from the biology side of things. As an undergrad I actually took a lot of music classes, and a lot of biology classes. At the time, a lot of biologists weren’t really thinking spatially. Everything was about static statistics, which assumes some normality that doesn’t really exist. There were people starting to pull on that thread, but it was the minority. My interest in GIS and the geospatial was applying it to understanding biology and ecology better, and then I never really got out of that rabbit hole.

Q: But you haven’t really left music either. You make custom guitars.

A: Very, very slowly. I’ve been making them for 12 or 13 years. I’m on guitar #2.

Q: That’s a really cool hobby.

A: It’s one of those things that seems like it should be harder than it really is. A lot of people think, “Oh, I couldn’t do that”, but actually it’s not that hard of a hobby, and for a woodworking hobby, it doesn’t require many tools. If you want to become a furniture maker, you need to invest a lot in tools just to start. The total cost for guitar-making is much smaller with a minimum viable set of tools, which is kind of cool. In that way, it’s kind of like open source. The barrier to entry for open source is just a laptop, which you may already have.

Q:  Totally. Let’s go back to drones for a minute. For those who might not be familiar with it, what is OpenDroneMap?

A: OpenDroneMap is an open source project for taking unreferenced images and turning them into geographic data. Maybe you have a balloon, kite, or drone, and you’ve taken some overlapping photos of an area, and you want to turn that into an orthophoto as a TIFF or PNG or a point cloud. It’s basically an extension of the photogrammetric techniques. Back in the day, you’d fly with a nice camera that was well parameterized so that you could correct for all of the optical distortion. You’d have a plane that was flying a known route with inertial navigation and GPS to help you know exactly where the plane is at any given point in time, and then you construct three-dimensional data from that, with contours and orthophotos. If you extend that concept, and instead of having two overlaps with lots of knowledge about your position, you have three overlaps, then you can write an equation that back-calculates where all of your camera positions are. In the process of doing that, you generate a point cloud of all of the features that match, which is something that you can derive other products from. You could create a mesh from that point cloud, then paint those photos back onto the mesh. Now you’ve got the geospatial information you need, and it can be turned into an orthophoto. When I first proposed the project, I thought, well we could license something like this, or we could start an open source project. I had a hunch there was enough existing computer vision code out there to get it 50, 60, or even 70% of the way there, just with the existing code. Fortunately my hunch was right. This leverages years of computer vision stuff done by people all over the world.

Q: It sounds like it was worthwhile to see what other people were doing, and build off of it.

A: Yeah, the stuff that people had been doing was absolutely brilliant, and allowed me to move whole hog and jump into the parts I was interested in.

Q: When I was in college I took some courses in remote sensing and did work with Synthetic Aperture Radar. I’m a little familiar with working with imagery. I’m guessing that working with imagery from drones is pretty different from working with aerial and satellite imagery. What are some of the differences you noticed in working with drone imagery versus something from an airplane or satellite?

A: A plane or a satellite gives you a nice synoptic view. There’s a usefulness, not in the specificity, but in the synopsis. If you think of the world as you view it from the ground, you can observe and make sense of the world; it’s what we’re most familiar with. There’s a wide gap between what’s happening in the plane or the satellite and the first-person view. Drones, balloons and kites fill that gap. Drones fill it particularly well because they can fill large areas. That’s what brought me into working with them altogether.

Q: Speaking of working, you work for the government. Could you tell us more about that?

A: I work for Cleveland Metroparks. We manage about 23,000 acres, which includes forests, wetlands, open areas for people to picnic, a zoo, lakefront parks, and really a whole range of interesting cultural and natural resources. We provide access for passive uses such as picnicking and hiking, and active uses such as events that draw people into those spaces. It’s a really cool park system with a lot of energy and a great history, as well as an amazing staff and a good vision for where we are now and where we’re going.

Q: How long have you worked there?

A: Seven years.

Q: I did some LinkedIn stalking, and I saw that you are a manager there. I’m sure that GIS manager can mean lots of different things depending on whether you’re with the government, a private company, or what industry you’re in. What are the things you think are common descriptors of GIS managers?

A: I’m relatively hands on. I’ll hack a code, I’ll work on data when I get the opportunity, but I also make sure to give a lot of freedom to the people that work with me, because they’re brilliant, and I don’t have to worry much.

Q: You sound like a great manager!

A: I’ve got great employees! There’s coordination and advocating for resources, ensuring that my employees have what they need. There’s also the aspect of ensuring that folks within the organization, as well as outside of the organization, understand what we do, so that they can value and take advantage of it. In addition to giving the degrees of freedom that people need in order to grow, we make sure they have educational opportunities and that they have challenges. There’s a lot of autonomy, which again links back to the open source community, where there’s a lot of autonomy.

Q: You’ve written a book on PostGIS. Can you tell us about the book and how it came about?

A: A couple years ago a publishing company discovered my blog and asked if I’d write an outline on PostGIS. I wrote them the outline, and they said “This is great, when can you start?” And I said, “I can’t, my daughter’s due in a few months, and there’s no way I can write a book.” They said, “Well, you could get a co-author”, and I said, “I can’t even write half a book!” Their response was “Well, you could do 60/40!”, and I said “Alright, but you’ve got to find the co-author”. They found Paolo Corti, who’s an excellent writer and knows his PostGIS stuff, and also knows the middleware level of that, and how to get it out to the web. That adds a nice element. Paolo and I started on that and we realized between the two of us, we weren’t going to get it all done. We found Bborie at the Boston code sprint, and Tom works with me and wrote a chapter. [Interviewer note: Bborie, Tom, and Paolo co-authored the book with Stephen.]

Q: Thanks so much! It’s been a lot of fun talking with you. I have one last question for you. Do you consider yourself a geohipster?

A: I’m a geohipster, absolutely! I’m the guy who predicted artisanal pixels. I don’t ride a fixie, but I do ride an e-bike. When I’m in sound health, I bicycle from 2-3 days a week, so I think I qualify.

Q: I think so, too.

Postscript: Steve gave me a signed copy of his book!
Post Postscript: Steve and I geeked out for a while about Synthetic Aperture Radar. We’ll spare you the nitty gritty details, but tweet at us if you ever want to talk SAR. We’ll talk your ear off. :)