Port jobs are used for copying datasets that are already on the Socrata platform. Port jobs allow users with publishing rights to copy both dataset schemas (metadata and columns) and data (rows). Port jobs also allow users to port derived views as stand-alone datasets. This guide shows how to setup and run a Port Job using the graphical user interface.
Navigate to the DataSync download page, and download the latest version.
Launch DataSync navigating to the folder containing the Datasync JAR file that you downloaded previously and either double-click the jar or run the following command:
java -jar <DATASYNC_JAR>
In the DataSync UI go to
File -> New... -> Port Job. This will open up a new Port Job.
Enter your authentication details at the bottom left of DataSync (domain, username, password, and app token). The domain is the root domain of the data site hosting the dataset you wish to port. It must begin with https:// (i.e. https://data.cityofchicago.org). The username and password are those of a Socrata account that has a Publisher role. Enter your App token. If you do not yet have an app token, please see how to obtain an App token. The username, password and application token will be saved as part of the job configuration. We recommend creating a dedicated Socrata account (with a Publisher role or Owner permissions to specific datasets) to use with DataSync rather than tying DataSync to a particular person’s account.
NOTICE: DataSync stores the authentication details unencrypted in the Registry on Windows platforms (in the following location: HKEY_CURRENT_USER\Software\JavaSoft\Prefs) and in analogous locations on Mac and Linux. If you are concerned about this as a potential security issue you may want to look into alternative publishing methods. Please contact support if you have questions.
The configurable options to run a Port Job are:
Copy schema only: This will copy the metadata and columns of the source dataset into a new dataset. No row data is copied over.
Copy schema and data: This copies both the metadata/column info and all row data, effectively making a duplicate of the source dataset.
Copy data only: This copies the row data from the source dataset into the destination dataset. The effect on the destination dataset is determined by the
Publish Methodoption below. Please note, this option will only succeed if the schemas of the source and destination dataset agree.
Source Domain: The domain to which the source dataset belongs.
Source Dataset ID: The dataset identifier of the source dataset.
Destination Domain: The domain where the source dataset will be copied to
Destination Dataset ID: The dataset identifier of the destination dataset. This is only needed if selecting
Copy data only as the PortMethod.
Copy data onlyas the PortMethod. Choose one of the following:
upsert: This will upsert the data from the source dataset into the destination dataset, updating rows that exist already, inserting those that do not.
replace: This will replace the data in the destination dataset with that in the source dataset.
Copy schema onlyor
Copy schema and dataas the PortMethod. Choose one of the following:
Yes: This will publish the destination dataset to complete the Port Job.
No, create a working copy: This will leave the destination dataset as a working copy.
Once your job is setup, you can run it like any other job. For more details, please see steps 4 and 5 of the Setting up a Standard Job (GUI) Guide.