Towards the Next Generation Road Survey

Over the past few weeks, I’ve managed to escape the office and get back to the field. With an impending change, it’s been a very refreshing time to get back into the mix – especially out onto the roads of Zanzibar.

Alongside work with scaling out Ramani Huria and working with (awesome!) colleagues on the signing of an Memorandum of Understanding between Ardhi University, World Bank, and DfID to support the development of curriculum (with ITC Twente) and the sustainability of community mapping in Tanzania for the next five years. I’ve been working on a side project to look at how machine learning can be used to assess road quality.

To do this, the N/LAB team at University of Nottingham and Spatial Info (the spin out of my team that helped build Ramani Huria/Tanzania Open Data Initiative) and I are working with the Zanzibari Department of Roads, under the Ministry of Infrastructure, Communications and Transport to survey all roads in Ugunja Island Zanzibar.

The Department of Roads & Uni Nottingham Team

So far, we’ve worked on getting a surveying vehicle back on the road, initial back and forth with government stakeholders, and working on pulling together the various road data sources (such those from the Government and OpenStreetMap) to work out where to drive and the sequencing of the survey. All of this will support a data collection protocol that merges traditional surveying techniques, with novel ones such as RoadLab Pro.

All of these data streams will then be used as a training dataset to see how machine learning can inform on road quality. But first, we’re getting the traditional survey underway. It’s going to be long road ahead – as long as all the roads in Zanzibar!

Watch this space, the project’s Medium page, and the N/LAB’s blog on using machine learning for automated feature detection from imagery. Get in contact below in the comments as well.

Written in the Al-Minaar Hotel, Stone Town, Zanzibar (-6.16349,39.18690)

OSM: Going Back in Time

I’ve been playing around with the full planet file to look at going back in time in OSM. Mainly, this is to look at how Ramani Huria’s data has evolved over time and is all part of extracting more value from Ramani Huria’s data.

I’ve been playing around with the full planet file to look at going back in time in OSM. Mainly, this is to look at how Ramani Huria’s data has evolved over time and is all part of extracting more value from Ramani Huria’s data. This process wasn’t as straightforward as I had hoped, but eventually got there – also, this isn’t to say that this is the only or best way. It’s the one that worked for me!

To do this, you’ll need a pretty hefty machine – I’ve used a Lenovo x230 Intel i5 quad core 2.6ghz, 16gb of ram with over 500gb of free space – This is to deal with the large size of the files that you’ll be downloading. This is all running on Ubuntu 16.04.

Firstly, download the OSM Full History file. I used the uGet download manager to deal with the 10 hour download of a 60gb+ file over 10meg UK broadband connection. Leaving it overnight, I had a full file downloaded and ready for use. Now to set up the machine environment.

The stack is a combination of OSMIUM and OSMconvert. On paper, the OSMIUM tool should be the only tool needed. However, for reasons that I’ll come to, it didn’t work, so I found a workaround.

OSMconvert is easily installed:

sudo apt-get install osmctools

This installs OSMconvert other useful OSM manipulation tools. Installing OSMIUM is slightly more complicated and needs to be done through compiling by source.

Firstly, install LibOSMIUM – I found not installing the header files meant that compilation of OSMIUM proper would fail. Then use the OSMIUM docs to install OSMIUM. While there is a package included in Ubuntu for OSMIUM, it’s of a previous version which doesn’t allow the splitting of data by a timeframe. Now things should be set up and ready for pulling data out.

Dar es Salaam being the city of interest, has the bounding box (38.9813,-7.2,39.65,-6.45) – you’d replace these with the South West, North West point coordinates of your place of interest, and use OSMconvert, in the form:

$ osmcovert history_filename bounding_box o=output_filename

osmconvert history-170206.osm.pbf -b=38.9813,-7.2,39.65,-6.45 -o=clipped_dar_history-170206.pbf

This clips the full history file to that bounding box. It will take a bit of time. Now we can use OSMIUM to pull out the data from a date of our choice in the form:

$ osmium time-filter clipped_history_filename timestamp -o output_filename

osmium time-filter clipped_dar_history-170206.pbf 2011-09-06T00:00:00Z -o clipped_dar_history-170206-06092011.pbf 

This gives a nicely formatted .pbf file that can be used in QGIS (drag and drop), POSTGIS or anything else. As the contrast below illuminates!

Tandale, Dar es Salaam, Tanzania – 1st August 2011
Tandale, Dar es Salaam, Tanzania – 13th February 2017

Enjoy travelling back in time!

All map data © OpenStreetMap contributors.

Building Heights in Dar es Salaam

I first went to Dar es Salaam in 2011, there were a few skyscrapers adorning the city’s skyline, now they’re everywhere! Sitting on a rooftop bar in the center of the city, it’s a mass of cranes and pristine new buildings.

Alongside this rapid growth, Ramani Huria has been collecting a lot of data but a lot of it doesn’t get rendered by the default OSM styles… so I’ve dug into the data and created a map of the different floors across the city.

This interactive map allows you to explore where the tallest buildings are in the city, but in displaying the data in this way, also allows for the densest, unplanned and informal areas of the city to become very clear.

There is still some way to go though – in Dar es Salaam there are around 750,000 buildings, with roughly 220,000 (~30%) having been surveyed by the Ramani Huria team and given an appropriate attribute. Ramani Huria has focused its efforts in the urban centres of Dar es Salaam, where most of the multi-story buildings are to be found. But, still a lot more to be covered towards Bagomoyo and Morogoro.

Hat tip to Harry Wood who’s advice and guidance pointed me in the right direction – a more technical blog post and more details of other challenges around correctness of tagging but that’s for another post – now to look at Floor Spaces Indices…!

Putting Crowdsourcing In Action

Crowdsourcing is increasing in popularity as a form of distributed problem solving enabled by digital technologies. “The crowd” is invited to contribute towards projects, this contribution potentially being in the form of knowledge or design skills. On June the 3rd this year an interdisciplinary workshop investigating crowdsourcing and citizen science convened. It brought together experts and practitioners from many disciplines that apply a crowdsourcing approach, presenting outputs and how crowdsourcing aids projects from GalaxyZoo (an interactive project for volunteer classification of galaxies), Artmaps (an application for crowdsourcing information on Tate digital artworks) and Taarifa (a platform and community supporting the crowdsourcing of public service issues in the developing and developed world).

My personal presentation was on Taarifa, a project I started in 2011 to support community based public service delivery. Since then I’ve worked in collaboration with the World Bank in Uganda to support the Education and Local Government Ministries with reporting across the country; what started as a pilot was rolled out quickly to cover 111 districts, over a year of an at-scale pilot 14,000 reports were received and acted upon. This resulted into wider research into the wider use of public participatory service delivery in developing countries (FOSS4G Taarifa paper). The uniqueness of Taarifa is that it has been developed and maintained by wholly volunteer contributors, creating free and open source software. The contributors to Taarifa are as diverse as the problems which Taarifa addresses, ranging from PhD Candidates like myself to Physicists, Bankers and Community Organisers. Consequently, Taarifa doesn’t just look after a platform of software; it acts as a forum to share knowledge, experimentations and innovation.

Taarifa was conceived at the London Water Hackathon, as an innovation around water access and quality in Tanzania. Access to water in Tanzania currently covers less than 50% of the country’s population and with 38% of the water infrastructure, like taps, are graded as non-functional. Currently the Ministry of Water has a WPMS (Water Point Mapping System) developed after a countrywide survey. However, the system has is no functionality to update the status of the water point or view a history of service problems. This is combined with poor performance of repairs nationally if water points are repaired; citizens are disenfranchised with current methods of reporting water faults, if they can report at all. The ecosystem around supporting the repair of water points is non-existent; consequently millions of Tanzanians have no access to publicly delivered water.

It is important to stress that there isn’t ‘one’ solution to the problem of water access nor is there ‘one’ platform or software to ‘fix’ the problem. There is no one discipline that can resolve the issue of water access, there has to be a multidisciplinary approach, to a multidisciplinary problem. Cartography, Economics, Engineering, not one discipline can wholly resolve the issue of water access, nor it is an issue which can be researched and resolved through the lens of one discipline. The societal side of technology needs to not be just taken into account, but integrated into the core of the design with the people who face the issue and who will use the technology to resolve it. It is imperialistic and deterministic to assume that technology can just ‘fix’ the kind of complex issue of water access, especially as the technology is, in effect, imposed broadly by outsiders to the community in which it is intended to take root. Hence, an understanding of the community is needed; who the users are, how water access is dealt with currently and the general state of affairs. From this we have created two streams of Taarifa, one that is currently implemented and one that is currently being designed, incorporating lessons learned from initial deployments.

The first iteration of Taarifa’s design story and user action assumed that mobile connectivity wasn’t an issues and that there was an active organisation, be it government or an NGO wanting to resolve water access issues, another predicate was that the water infrastructure was adequately mapped. This led to the following reporting process for a water issue; When a report is made, for example from a Community Water Officer or a concerned citizen, it goes into the Taarifa workflow, which identifies the specific water point from the database. The reporter is then notified, thanked that they have made a report and given an estimation, based on prior time taken to repair broken water points in that district on how long it will take. An engineer is informed what is wrong with the water source. Once an engineer has been selected, a verifier can verify that repair has been completed satisfactorily. Importantly, at each stage the initial reporter is informed about the progress of the repair. This was the version trialled in Uganda.

Subsequently, learning from how Taarifa was deployed and used, the design is now intended to incorporate offline capability and ‘marketplace’ functionality. The offline capability due experiences in Uganda that connectivity wasn’t universal (this was not a surprise, however, improvements to the paradigm should be incremental) the marketplace due to the capacity of local government and organisations. If a district has no capacity to repair a broken water point, the cost could be estimated by a number of engineers receiving information about the problem and they bid using their phones. A micropayment is taken to support the system, providing a surplus, potentially reinvested into creating new capacity. Micropayments are ubiquitous in the developing world, effectively replacing a formal banking infrastructure, hence are familiar to the communities who will use this method. Consequently this should hopefully be viewed as an extension of what already exists, not something completely new.

What does all this mean within the context of the Crowdsourcing in Action workshop? Broadly it allows us, as academic researchers to typify crowdsourcing and understand more. Taarifa acts as a community crowdsourcing code and by extension curating community reporting issues in developing countries. Artmaps develops applications for use on smart phones that will allow people to relate artworks to the places, sites and environments they encounter in daily life. GalaxyZoo leverages the many eyes of the crowd to process space imagery data. Thematically, all the projects presented utilised volunteers to provide information, process it and return it to the user and other interested actors.

After the initial presentations we formed groups, of other experts and practitioners to build a common model of what crowdsourcing means to their projects and work. Then coalescing at periods to feedback practice and information learned from other participants. In doing so, we learn from successes and failures of others, understanding common themes for collaboration.

In identifying these common themes, it hopefully sets an agenda to focus on specific factors and communities under the crowdsourcing agenda. Jeremy Morley and I are planning a future “How to Hackathon” event building “Crowdsourcing in Action”. Hackathons allow volunteers (generally) to co-create ‘hacks’ to problems. In its truest sense you accelerate innovation by combining a random mix of people and skills, providing a set of previously unsolved problems, then observing what happens. As was identified in the Crowdsouring In Action, we can observe the states before crowdsourcing; we can help provide a process for participants; we can observe and process the result. However, an understanding of how participants use the tools to conduct crowdsourcing is scant. By now focusing on hackathons, we hope to discover more on how the design and development of crowdsourcing works.


On the 3rd to the  5th of April I attended GISRUK (Geospatial Information Research in the United Kingdom) to give a paper on Community Mapping as a Socio-Technical Work Domain. In keeping with Christoph Kinkeldey‘s love of 1990s pop stars Vanilla Ice made a second slide appearance, leveraging the fact it’s a very technical academic title. In short I’m using Cognitive Work Analysis (CWA) to create a structural framework to assess the quality (currently defined by ISO 19113:Geographic Quality Principles – well worth a read…) where there is no comparative dataset.

CWA is used to assess the design space in which a system exists, not the system itself. In taking a holistic view and not enforcing constraints on the system you can understand what components and physical objects you would need to achieve the values of the system and vice-versa. In future iterations I’m going to get past first base and look at decision trees and strategic trees to work out how to establish the quality of volunteered geographic data without a comparative dataset. Building quality analysis into day one, as opposed to being an after thought.

Written and submitted from Home (52.962339,-1.173566)


A Manifesto for the OSM Academic Working Group

A fellow member of the OSM Foundation replied to a conversation on the mailing list: “As a guerrilla academic…“. The context was around a suggestion for increased academic cooperation within OSM. To this end I proposed a new working group for the OSMF: Academic Working Group. This would have the aim of improving the academic quality and communication of those using OSM in their research and facilitating collaboration.

Below is the start of the manifesto. It’s not complete, but it’s a start.


Academic institutions use OSM data. Be it part of their published research or testing hypotheses. Some of the publications are listed on the wiki: However within OSM and OSMF this research is undertaken under the researchers own initiative. Researchers are looking at OSM through recommendation (supervision) or self interest within their own academic structures. Given the growth of OSM and the research into it, it seems likely that academic interest will widen and grow.


Support academic research in OSM, encouraging best practices and acting as a forum for researchers. This has the aim to support researchers starting out with OSM but also to unify a community of existing researchers; collaborations and knowledge sharing will hopefully follow. Identification of areas of research for the community as a whole among potential themes of usability and business models (as a starting point).


  • Uniting existing researchers, either at existing institutions or those following independent academic study.
  • Provide documentation (a la learnOSM) but focused for researchers.
  • Provide a forum for researchers to discuss their research and bridge into the community
  • Support and provide problems to the academic corpus.
  • Communicate potential collaborations, needs, wants.
  • More TBD

Working Group vs. Community

I think this is hitting a gap that exists in the community currently. I don’t see potential areas for conflict. However that being said do we have enough members within the OSM(F?) to create and steer the working group?

WWWG vs. other WGs

There is a small amount of overlap in interest between this proposed AWG and other Working Groups.  I can see potential overlap with communications and strategic working group. Communications as this would aim to focus on building up the OSM academic community. Strategic as they may wish to commission studies or at least support them, into critical areas of OSM.

Next steps 

Again, I’ll throw this to the OSMF. Where should we go from here?

Written and submitted from the London St. Pancras to Nottingham Train.

Research Impact

The digital economy program encompasses five universities (Nottingham, Cambridge, Reading, Exeter and Brunel) with numerous Doctoral Training Centres (DTC) training ‘the next generation of researchers’. I’m quite fortunate to be at the Nottingham Digital Economy DTC, which is rare in that it has a combined research hub and DTC. From time to time the hubs and DTCs get together in conference, where the collective research efforts and outputs is demonstrated. However every so often the research council – i.e. the people that write the checks – wish to see the results of their labours.

Due to the nature of the PhD programs – in a cohort, as opposed to individual working – they wished to understand value for money, more specifically the research impact that the programmes have had. The format for this was a poster/live demonstration of gadgets, followed by an interview session with some direct, searching questions. It quickly became apparent the role of the DTC staff was multifaceted, focusing not just on research and supervision but deftly dealing with the work associated with the DTC. Having a window into this world offered a very different perspective on the process of research councils, from the council’s expectations to the reality of research output and the seemingly intangible process of ascertaining ‘impact’ – Impact seemingly being if you’ve done something useful and interesting.

Meeting the other DTCs with the twist of funders and assessors was a good break from the usual, made special that it only happens every few years. The only downside of the process was the EPRSC (Engineering and Physical Sciences Research Council) was based in tragic town of Swindon. However the bods have pulled a bit of a wheeze by placing the building next to the train station, adding a dedicated footbridge. This means, if day tripping, you don’t physically have to enter the town. Instead you walk in the footbridge (from the set of Threads) direct to the centre, unfortunately without seeing the famous Swindon vistas! All in all a good day all around!

Fear and Loathing in Las PhD

This post I guess has been a long time coming, basically hit it’s zenith and then subsided. About a month ago I had serious doubts within the PhD, around whether I was ‘good enough’ to complete. The majority of doubt focused around completing what I had perceived to be an easy task of implementing a ‘simple’ algorithm. This turned into three weeks of nothing. Breakthrough occured on what was supposedly a three day break in Marseille – before a 12 day conference schedule in Avignon (AGILE) and Amsterdam (WhereCampEU).

It would be fair to say I’d hit the lowest point of the PhD then. It was a sequential thought process, if I can’t do this simple thing, how am I prepared for the harder things later. Doubt set in, and the analysis was concluded in that I should quit. Then the break through came and all was good, confidence restored. I then read ‘The Valley Of Shit‘ a blog about going through the same thing; Valleys lead to somewhere else – if you can but walk for long enough. Unfortunately the Valley of Shit can feel endless because you are surrounded by towering walls of brown stuff which block your view of the beautiful landscape beyond.”

Anyhow, I feel out of the valley now. All is good.

Written and submitted from Coffee Company, Amsterdam, Netherlands (52.371554,4.896772)

Wernigerode PhD Summer School

Another hectic few weeks draws to a close. The literature review is more grounded than before nearing 5,000 words and pretty much all of them are ‘good’. The past week has been spent in Wernigerode in the Hars region of Germany on an AGILE (Association of Geographic Information Laboratories Europe) PhD Winter School.

Mixing PhD candidates from research centres across Europe from GIS, Geomatics and other disciplines the event started with the customary ice-breaker. Even over (some very very good) local beer, it was clear that the participants were from very diverse backgrounds, most in the process of doing interdisciplanary research, with projects looking at conflating ontologies, predicting the location(s) of serious criminals, visualising change and crowd sourcing 3D building models among many others.

The first day started with introductions, followed by 10-15 presentations with questions on our topics. Taking all day it was good to see how other geospatial PhDs evolve in differing subjects and countries. During a very German (schnitzel) lunch we wandered in the forest surrounding Wernigerode. Though the place is quite of the beaten track it really is worth a visit if you want to chill out.

The second day started with a very good talk from Bénédicte Bucher from IGN about the differing research groups of the French National Mapping Agency, concluding with her thoughts on the PhD process. She noted that when you first start it’s like being in a Bazaar, you see the different pathways however, eventually, you’ll be forging your own path in the wilderness over tough terrain.

This followed into break out sessions with other participants to start either a paper or initiative on your subjects. Being in the Volunteered Geographic Informations group we went back to basics. Though we were from differing subsections of VGI (Crowd sourcing 3D indoor models, policy of VGI in government and myself in Community Mapping) our common ground was the lack of definitions in VGI, so we proposed an AGILE initiative to fix this.

After putting a 10 minute presentation together (where we managed to get MC Hammer into a slide, under the rather tenuous headline of “Break It Down”) we formally ended the winter school with our proceedings in hand. Then a spot of further networking in the only club in Wernigerode…!

Written and submitted from the DB RegioBahn Magdeburg – Berlin train.

The Role Of Ethics

All academic research should be ethical, guidance comes from the research councils and, generally, from individual departments and faculty ethics boards. Unfortunately the responsibly produce ethical research is on the individual researcher, not with boards and faculty. When I took my first year viva, in it contained some analysis to counter some questions on data and what is possible, knowing it would be a consideration.

In it I produced a time series analysis of the 2010 Egyptian elections with the data from U-Shaid an Ushahidi instance. This hadn’t gone to an ethics board, it had only been seen by my supervision team to allay fears on whether data could be found and analysed (The question on how useful the data is for my research is still being answered). On the day the viva went well with a few discussion points on the data and what it means for future analysis – At this juncture, I’d like to point out that this isn’t a mea culpa of saying I do unethical research!

Because I have decided to use an ‘ethnographically informed’ methodology/action research a lot of the research is based upon my experiences and personal narrative. This then is supported by interviews with experts or people working within the domain. This part of the research isn’t controversial, the ethics around interviews and ethnography are well structured, with clear pathways, dos and don’ts. Fixing these ethics seemingly is based around not doing harm and ensuring informed consent; this being where the subject is being made fully aware of the aims and goals of the experiment before participating. This isn’t cast in stone, especially when trust/deception are part of the experiment, but considerations need to be made. Effectively don’t do the Milgram Experiment, or kill anyone.

At first glance it doesn’t appear that any of that has anything to do with my research. As I’m revolving around community mapping, public participatory GIS (PPGIS), volunteered geographic information (VGI) the ethics are a lot more obfuscated. Something as simple as data collection and its nature can be questioned. Primarily I will use Open Street Map as my VGI source, following Muki Haklay’s work on comparing VGI with an authoritative one, to potentially answer a question around data quality in slum mapping. Informed consent here for mapping parties or the mapping process shouldn’t be too difficult, but what about the pre-existing data?

It’s not realistic to get consent from every person that has contributed to the map. When OSM started blank spots were common, not just in developing nations but in developed nations also. Due to satellite imagery and organisations like HOT (Humanitarian Open street map Team), mappers have also been tracing satellite imagery when a physical survey was impractical. As we want to analyse this information how do we gain consent? Should a clause be put into OSM’s licence (this is bad idea)? Is there a differentiation between mappers which trace and those that physically survey, are all maps created equal?

There is scant research out there, namely “Research Ethics for Studying Open Source Projects” and “Internet Research: Privacy, Ethics and Alienation – An Open Source Approach”. Specifically relating to geospatial data and OSM there is none. The only mention relating to OSM is SK53’s post on research relating to OSM being behind a paywall, which is a fair point but still doesn’t answer whether we should be using data without consent. I sense a PhD chapter forming.

Written and submitted from the Nottingham Geospatial Building (52.953, -1.18405)