FME - Really needs a "pivot" transformer
I really like FME as a multi-faceted data manipulation tool. It’s got a lot of advantages in that;
it allows seamless integration of spatial and non-spatial data manipulation
it is a graphical tool which works well when you aren’t building too big a workspace (more on this later)
running the workspace with cached features means you can see exactly what is happening with your data flow
it can pretty much read and write every possible data format (this depends on which license you have, but even the basic option has tons of formats and most of the ones you need outside of an entreprise setting)
Safe Software seem to be making somewhat of a pivot to capture a more general purpose data audience, with more emphasis on its big-data, streaming and app functionalities. This is the segment that I seem to fall into - I’m not doing any complex spatial data management or munging, or conversions from BIM files to Pointclouds - more development of reproducible, easily interpretable and scalable pipelines, with some interaction with Census or other simple spatial formats.
Spatial Category Counts
One action that is very common in data analysis is counting of a categories within an area, for example - how many valuation properties are there of a particular category in a given small area? Doing this within FME, the workflow I use most frequently is a point on area overlayer, then an aggregator which counts the area/category combinations.
This results in a “long” table - in other words, there is a table with three columns, containing the area id, the category and the count value. “Long” data is generally the ideal format, but is not the best when considering things from a spatial or presentation perspective (wide data works best for spatial features as the geometry only has to be loaded once - otherwise, there's a one-to many join which involves two tables and.... headaches for a lot of visualisation software). Hence the next step is to perform a “pivot” to obtain a row for every area, with columns for each category containing the count of that category within the area.
This is a well recognised data processing step in languages like R, where the tidyverse “pivot_wider” (see image) function implements this step, but is not supported in FME and has to be achieved in various convoluted ways - this stackoverflow answer outlines one way, and there is a tutorial here that doesn’t quite hit the mark - the attribute pivoter outputs a table where the relevant attributes have to be explicitly exposed, or the output sent to a dynamic excel writer.
Python Caller
Another approach is to use a Python Caller. In this case, I've used pandas as it is the most versatile data manipulation tool available for python.
Here's the code - it makes use of the pandas library which you will have to explicitly load it into the directory of packages that FME can access; the way to do this is here . Note that you still have to manually expose the attributes you want via the Attribute Exposer transformer after you run the Python Caller with the code below.
import fme
import fmeobjects
import pandas as pd
class FeatureProcessor(object):
def __init__(self):
self.df = pd.DataFrame
self.row_list = []
def input(self, feature):
self.row_list.append({
'SA':feature.getAttribute('SMALL_AREA'),
'category':feature.getAttribute('category'),
'property_ct':feature.getAttribute('vo_property_ct')
})
def close(self):
self.df = pd.DataFrame(self.row_list)
op = self.df.groupby(['SA', 'category'])['property_ct'].agg('sum').unstack().fillna(0)
op_dict = op.to_dict('index')
# set a loop through the op df to create a feature for every row, with an attribute for each value in the row dict
for sa, atts in op_dict.items():
feature = fmeobjects.FMEFeature()
feature.setAttribute("SMALL_AREA", sa)
for k, v in atts.items():
feature.setAttribute(k, v)
self.pyoutput(feature)
R Caller
The code for this is even simpler as R is ideally suited for this.
The implementation is however somewhat more tricky as R is not installed by default with FME. I also use some additional libraries such as dplyr and magrittr. You can find out more about using R with FME here.
In this transformer, I simply group by the Small_Area and use the pivot_wider function to get the desired result.
The code:
library(tidyverse)
library(magrittr)
SA_Category_Counts %>%
pivot_wider(
names_from = "category",
values_from = "vo_property_ct") %>% mutate(across(where(is.numeric), ~ replace_na(.x, 0))) -> op_df
fmeOutput <- op_df
Conclusion
You can see the final workspace above, and you can download it here.
FME is a powerful tool, but it lacks some very basic and simple data processing steps and the "pivot" is a glaring example. It more than likely is something to do with the fundamental engineering of FME, in that it is built with generalism in mind, and the whole concept of attributes being genererated ad hoc by a transformer doesn't mesh well with that. If the workspace terminated with a feature writer, with no set schema defined, it probably would output the result, but this is rarely the end-game when performing a pivot operation.
Σχόλια