Port jobs are used for copying datasets that are already on the Socrata platform. Port jobs allow users with publishing rights to copy both dataset schemas (metadata and columns) and data (rows). Port jobs also allow users to port derived views as stand-alone datasets. This guide shows how to setup and run a Port Job using the command line interface.
Information about your domain, username, password and app token is required for all DataSync jobs. Note that the user running the job must have publisher rights on the dataset, and that the domain used here must be the site hosting the dataset to be ported. A number of other global settings, such as logging and emailing preferences can also be configured. Please refer to the configuration guide to establish your credentials and preferences.
For general help using DataSync in headless/command-line mode run:
java -jar <DATASYNC_JAR> --help
To run a job execute the following command, replacing <..>
with the appropriate values (flags explained below):
java -jar <DATASYNC_JAR> -c <CONFIG FILE> -t PortJob -pm copy_all -pd1 <SOURCE DOMAIN> -pi1 <SOURCE DATASET ID> -pd2 <DESTINATION DOMAIN> -pdt <TITLE OF NEW DATASET> -pp true
Explanation of flags:
*
= required flag
Flag - Short Name | Flag - Long Name | Example Values | Description |
---|---|---|---|
-t * |
--jobType | PortJob | Specifies the type of job to run. |
-c | --config | /Users/home/config.json | Points to the config.json file you created in Step 1, if you chose to do so. |
-pm * |
--portMethod | copy_all | One of copy_all , copy_schema or copy_data |
--pd1 * |
--sourceDomain | https://opendata.socrata.com | The scheme and domain to which the source dataset belongs. |
-pi1 * |
--sourceDatasetId | m985-ywaw | The dataset identifier of the source dataset. |
--pd2 * |
--destinationDomain | https://opendata.socrata.com | The scheme and domain where the destination dataset should be copied. |
-pi2 | --destinationDatasetId | ax36-bgg2 | The dataset identifier of the destination dataset.; only relevant if choosing copy_data for the --portMethod |
-pdt | --destinationDatasetTitle | "Crimes 2014" | The title to give the destination dataset; only relevant if the destination set is being created by either choosing copy_all or copy_schema for the --portMethod |
-pp | --publishDestinationDataset | true | Set this to true to have the destination dataset published before the Port Job completes; only relevant if the destination set is being created by either choosing copy_all or copy_schema for the portMethod. If false , the destination dataset will be left as a working copy (false is the default value) |
-ppm | --portPublishMethod | replace | Specifies the publish method to use (replace or upsert ). For details on the publishing methods refer to Step 5 of the Setup a Port Job (GUI) |
Information about the status of the job will be output to STDOUT. If the job runs successfully a ‘Success’ message will be output to STDOUT, the destination dataset id will be printed out and the program will exit with a normal status code (0). If there was a problem running the job a detailed error message will be output to STDERR and the program will exit with an error status code (1). You can capture the exit code to configure error handling logic within your ETL process.
java -jar <DATASYNC_JAR> -c config.json -t PortJob -pm copy_schema -pd1 https://opendata.socrata.com -pi1 97wa-y6ff -pd2 https://opendata.socrata.com -pdt ‘Port Job Test Title’ -pp true
config.json contents:
{
"domain": "https://opendata.socrata.com",
"username": "publisher@socrata.com",
"password": "secret_password",
"appToken": "fPsJQRDYN9KqZOgEZWyjoa1SG",
}
Running a previously saved job file (.spj file)
Simply run:
java -jar <DATASYNC_JAR> <.spj FILE TO RUN>
For example:
java -jar <DATASYNC_JAR> /Users/john/Desktop/business_licenses.spj
NOTE: you can also create an .spj file directly (rather than saving a job using the DataSync UI) which stores the job details in JSON format. Here is an example:
{
"portMethod": "copy_all",
"sourceSiteDomain": "https://louis.demo.socrata.com",
"sourceSetID": "w8e5-buaa",
"sinkSiteDomain": "https://louis.demo.socrata.com",
"sinkSetID": "",
"publishMethod": "upsert",
"publishDataset": "publish",
"portResult": "",
"jobFilename": "job_saved_v0.3.spj",
"fileVersionUID": 1,
"pathToSavedJobFile": "/home/louis/Socrata/Github/datasync/src/test/resources/job_saved_v0.3.spj"
}