Wednesday, October 1, 2014

Georeferencing evaluation with FME.

FME and data evaluation.

FME is a great tool to validate and evaluate data (next to the many things you can do with FME)
There are plenty of resources available on the subject demonstrating FME's data validation and Q&A capabilities.

Data evaluation can involve different aspects and have many forms.
For this post I choose to evaluate how well a publicly available data set can be georeferenced (if you can add value to it and put it on a map, why shouldn't you...)
For any serious conclusions, you'll have to work it out yourself, since my main intention is to demonstrate FME capabilities (and not bad mouth anybody particularly...)

Data source.


The Dutch government publishes many data sets openly and the numbers are increasing all the time.
I choose to use the data set of the national education registry since it is highly dynamic and it contains addresses, which makes it possible to potentially georeference the features.

The data used is available in csv format, which can easily be accessed online via the CSV reader (just point it to the url). For limiting sorting and filtering the incoming data, see my previous post: Where clause on text.
This results in a continuously updated data source, which is great to have but poses a challenge when displaying the results.

 Georeferencing the data.


    For georeferencing the source data I am using the BAG Geocoding service, available via the National SDI.
    An easy way in FME to access the service is by a HTTPFetcher transformer.
    By constructing the URL in the transformer's text editor and making use of attributes values, a very flexible solution is created.

    BAG Geocoding service results.

    The BAG Geocoding service returns the location(s) in an XML snippet that translates into geometry and attributes. In case of ambiguity or lack of sufficient input, the service returns an aggregate geometry.
    Somewhere in the aggregate geometry the corresponding location and attributes can be found (well most of the times...)
    Using the total count of both georeferenced and failed features, simple statistics (percentage of correct georeferencing) can be gathered and used for display.

    HTTPFetcher

    Interpreting the results.

    Some of the 'failed' to georeference features do actually exist (BAG Web) and can be correctly geocoded by slightly changing the address used, see for example georeferenced (note the street tag) and not georeferenced (note URL used =  input address)

    Displaying the results.

    I am using Google Fusion Tables to display  the results since it is an easy way to share geographical information (article is in dutch)
    Also non-spatial data can be shared this way and the failed features are saved into a non-spatial Google Fusion Table. Needless to say FME supports both spatial and non spatial reading and writing of this format.
    Some limitations of this format are the number of features supported and that it is still considered an experimental format, something that unfortunately makes it less reliable.
    As mentioned before the input source data is updated frequently and in contrast the displayed results are static and present a moment in time.


    Map of results, created 1-10-2014.

    Findings and (possible) future developments.

    A way to keep the displayed results up to date would be to use FME Cloud, something I still have not got around to try. I imagine that using FME Cloud to run this workspace would not require almost any resources or adaptations, since most of the data is on-line.
    Some of the finding are:

    • Saving the source csv data is necessary due to memory issues (something that is easily done in the HTTPFetcher)
    • Another curious issue found is that using the postcode in the request string actually results in less features georeferenced.
    Don't forget that by the time you read this post the output might look very different.