top of page

FME - Really needs a "pivot" transformer

I really like FME as a multi-faceted data manipulation tool. It’s got a lot of advantages in that;

  • it allows seamless integration of spatial and non-spatial data manipulation

  • it is a graphical tool which works well when you aren’t building too big a workspace (more on this later)

  • running the workspace with cached features means you can see exactly what is happening with your data flow


  • it can pretty much read and write every possible data format (this depends on which license you have, but even the basic option has tons of formats and most of the ones you need outside of an entreprise setting)

Safe Software seem to be making somewhat of a pivot to capture a more general purpose data audience, with more emphasis on its big-data, streaming and app functionalities. This is the segment that I seem to fall into - I’m not doing any complex spatial data management or munging, or conversions from BIM files to Pointclouds - more development of reproducible, easily interpretable and scalable pipelines, with some interaction with Census or other simple spatial formats.

 

Spatial Category Counts

One action that is very common in data analysis is counting of a categories within an area, for example - how many valuation properties are there of a particular category in a given small area? Doing this within FME, the workflow I use most frequently is a point on area overlayer, then an aggregator which counts the area/category combinations.

FME Point on Area Count Process
FME Point on Area Count Process
Long Table Output
Long Table Output

This results in a “long” table - in other words, there is a table with three columns, containing the area id, the category and the count value. “Long” data is generally the ideal format, but is not the best when considering things from a spatial or presentation perspective (wide data works best for spatial features as the geometry only has to be loaded once - otherwise, there's a one-to many join which involves two tables and.... headaches for a lot of visualisation software). Hence the next step is to perform a “pivot” to obtain a row for every area, with columns for each category containing the count of that category within the area.

R Pivot Wider in Action
R Pivot Wider in Action

This is a well recognised data processing step in languages like R, where the tidyverse “pivot_wider” (see image) function implements this step, but is not supported in FME and has to be achieved in various convoluted ways - this stackoverflow answer outlines one way, and there is a tutorial here that doesn’t quite hit the mark - the attribute pivoter outputs a table where the relevant attributes have to be explicitly exposed, or the output sent to a dynamic excel writer.

 

Python Caller

Another approach is to use a Python Caller. In this case, I've used pandas as it is the most versatile data manipulation tool available for python.


Here's the code - it makes use of the pandas library which you will have to explicitly load it into the directory of packages that FME can access; the way to do this is here . Note that you still have to manually expose the attributes you want via the Attribute Exposer transformer after you run the Python Caller with the code below.

PythonCaller parameters for pivot process

import fme
import fmeobjects
import pandas as pd

class FeatureProcessor(object):

    def __init__(self):
        self.df = pd.DataFrame
        self.row_list = []

    def input(self, feature):
        self.row_list.append({
        'SA':feature.getAttribute('SMALL_AREA'), 
        'category':feature.getAttribute('category'), 
        'property_ct':feature.getAttribute('vo_property_ct')
        })   
 
    def close(self):
        self.df = pd.DataFrame(self.row_list)
        op = self.df.groupby(['SA', 'category'])['property_ct'].agg('sum').unstack().fillna(0)
        op_dict = op.to_dict('index')
        
        
        # set a loop through the op df to create a feature for every row, with an attribute for each value in the row dict
        for sa, atts in op_dict.items():
            feature = fmeobjects.FMEFeature()
            feature.setAttribute("SMALL_AREA", sa)
            for k, v in atts.items():
                feature.setAttribute(k, v)
            self.pyoutput(feature)
 

R Caller


The code for this is even simpler as R is ideally suited for this.

The implementation is however somewhat more tricky as R is not installed by default with FME. I also use some additional libraries such as dplyr and magrittr. You can find out more about using R with FME here.

In this transformer, I simply group by the Small_Area and use the pivot_wider function to get the desired result.


The code:


library(tidyverse)
library(magrittr)

SA_Category_Counts %>%
  pivot_wider(
  names_from = "category", 
  values_from = "vo_property_ct") %>% mutate(across(where(is.numeric), ~ replace_na(.x, 0))) -> op_df

fmeOutput <- op_df


Conclusion


This image illustrates an FME workspace which can be used to perform a pivot from long to wide data, using Python and R code embedded in the workspace.
Final workspace layout

You can see the final workspace above, and you can download it here.


FME is a powerful tool, but it lacks some very basic and simple data processing steps and the "pivot" is a glaring example. It more than likely is something to do with the fundamental engineering of FME, in that it is built with generalism in mind, and the whole concept of attributes being genererated ad hoc by a transformer doesn't mesh well with that. If the workspace terminated with a feature writer, with no set schema defined, it probably would output the result, but this is rarely the end-game when performing a pivot operation.



댓글


bottom of page