Wednesday, September 16, 2015

The Sorrows of Young OSM User

OpenStreetMap (OSM for short) is a web site that not only displays maps but also provides access to the database from which these maps are rendered. So far, so good.

Our company now came up with the idea that given a street name and a region where the street is located (around the globe there are tons of streets that go by the name "main street") we use the Overpass API to query the ways that make up these streets.

The plural streets reflects an important issue: Even within a single city there may be several streets that go by identical names.

In principle this can be achieved assuming that the user is young enough not to die of old age before the query yields a result.

We looked into alternative ways for directly fetching data from OSM but all either turned out to be too slow or simply not up to the task.

It seems as if the only feasible way is to have two server tasks:
  1. One that fetches the data from OSM.
  2. One that provides the full path via an API.
The point is that there are still three possible ways of implementing the first server task.
  1. To have a job that is triggered at certain times and that imports the data.
  2. To have let the server operate as some sort of caching proxy for the data.
  3. To have a mix of the two previous solutions.
My idea is as follows:
  1. Initially the server's database is populated using an appropriate import feature.
  2. If a request occurs it is checked whether the requested data is in the database and is not outdated (with a reasonable definition of to be outdated that does not require a request to OSM).
  3. If the server's data is outdated an updated version is requested from OSM. This can yield two results:
    1. The query can quickly return the current information. In this case the data is stored in the server's database and delivered to the user.
    2. The query does not return the result within a reasonable time. In this case the server returns the data that is stored locally. Chances are very good that the outdated data still is a reasonable approximation of the most recent data. This case has two subcases:
      1. The query simply takes too long but still returns a result. In this case the returned data is used to update the database.
      2. The query times out and does not return a result. In this case the outdated data is kept.
  4. Every now and then the server checks which data is most outdated and tries to obtain these data.
Comments?