Saturday, August 16, 2014

Where clause on text.

Where clause.

A where clause in its basic form is used to filter features and is used with database formats.
The use of a where clause can deliver workspace related efficiency, by resulting in only the features necessary for the translation.
You can do much more in a database where clause (joins, sub-queries), but for the purpose of this post I will stick to its basic use (e.i filtering)





Example.

Inspired by Safe's blog post, I have downloaded some bird tracking data* from the Movebank Data Repository .

The data comes in a csv format, which is considered a database format in FME, but is actually plain text. The goal is to present the features on a map.

Cory's shearwater - going the distance.

Data content.

The csv file contains location information as lat/long coordinates among other types of sensor related information.
For more information about the data see the readme file provided.





Data transformation.

read data that cannot be used ?
To transform the location information into point features the VertexCreator transformer can be used.
However when doing so, disregarding the first law of FME (which is?), the transformation will halt because some features do not contain values in the location columns.
That can be easily solved by testing the data before creating the geometry. 
But by reading the entire dataset and then filtering unnecessary (or unusable) features, you are reading more than is necessary and it is not efficient.
 

Filtering while reading.

To my surprise, I have stumbled across a new functionality in the csv reader, that enables such filtering.
I say to my surprise, since I totally missed out on the announcement related to this addition.
This functionality is found at the csv reader parameters. First you have to enable it and then set it.

According to the documentation: "The filtering is done by a regular expression string that will be compared against the values of attribute fields specified."

This means that if you know your regular expressions, serious complex filtering can take place.

For this case it is a simple string that filters the lat/long attribute fields, returning only columns in which values are found.
Simple regular expression.

Workspace.

With and without filtering.
This new functionality offer new possibilities that did not exist before FME 2014. And even if it's not a where clause as in a database, the abilities to filter and sort are welcome useful additions.




* Gagliardo A, Bried J, Lambardi P, Luschi P, Wikelski M, Bonadonna F (2013) Oceanic navigation in Cory's shearwaters—evidence for a crucial role of olfactory cues for homing after displacement. Journal of Experimental Biology, v. 216, p. 2798-2805.


Saturday, August 2, 2014

Built in geometry validation.

Post origin.

The idea for this post comes from a tweet by a FME user (thanks Richard!) and since it is cucumber time I thought what the heck why not try it out myself.

Geometry validation in FME.

The GeometryValidator is the transformer for geometry validations in FME. To be able to validate the data you need to read it into the workspace.

valid?
With database formats, with their built in ability to validate geometry, validation and tagging can take place when reading.
In Oracle that can be done with the geometry-related PL/SQL subprograms in the SDO_GEOM package.
In this case I am using the sdo_geom.validate_geometry_with_context  subprogram to locate and tag geometry errors.

Built in geometry validation in the Oracle reader.

To validate the data while reading, the SQL statement should be used in the feature types parameters select statement (not to be confused with the reader select statement)
When configured correctly both valid and invalid features are returned.

As you all know, databases return error's as a number and error location and that there are internal ways to use existing database capabilities to make the error's more human readable.
But since FME is great at reaching out to the web, grabbing data and making use of it, what can be better (and more fun) than parsing the web pages with the error's descriptions and adding it to the features?

Adding error descriptions.

Since I am using Oracle, the logical place to search for error's descriptions is the Oracle on-line documentation. Grabbing the web page is done with the HTTPFetcher, parsing it with the XMLFragmenter and then it's a matter of testing for the correct string. To finish it up I have created a custom transformer that does exactly that.
Workspace.


 

Useful?

As in most things FME, there are many ways to achieve the the same result, and it's a matter of personal preference and experience on how one approaches a problem. That said, a problem should not be the sole reason for using FME. It can also be just for fun.
So is it useful to have built in geometry validation and tagging? well I guess it is since otherwise why would anybody try?
Some other advantages might be:
  • Awareness of  geometry error's (do you assume the data is always geometrically correct?)
  • The option to act upon that awareness.
  • Reducing the number of features that need repair. In translations that involve a high volume of data and long computations, reducing the number can result in gaining workspace related efficiency.

Results.