CSV to Socrata Transform

This workflow demonstrates the basic functionality of Pentaho Kettle Data Integration in order to clean a file, pivot columns to rows, and publish to Socrata.

To publish to Socrata using Pentaho is a two-step process. 1 workflow runs the transformations and the 2nd workflow runs datasync on the updated and transformed csv.

The Transformations

tranformation.ktr workflow:

  • CSV Input
  • Row Normaliser in order to Pivot Columns into flat rows
  • Filter Rows to remove unwated “NA” characters
  • Select Values (renames to schema, selects fields to write to output file)in order to Rename Columns
  • Formula in order to change decimal rate to whole number and comply with Socrata Percent Data Type
  • Replace String in order to remove unwanted text from cells
  • Text Output (force quote enclosure, comma separator, ISO8601 date format and trims all fields)

datasyncrunner.kjb:

Quickstart

Follow these steps to get the example workflow working.

  1. Download Pentaho Kettle Community Edition and Datasync
  2. Download the workflow and associated files using the Download link above and open it from Spoon. NOTE: You may have to right-click > Save Link As… to get it to download.
  3. Update the Paths to the Raw Inspection and Violation files so they point to the correct directory.[^2]
  4. Update the path for the Text file output.
  5. Run the workflow and ensure that the file transformed_data.csv was created in the directory with current timestamp.