To obtain the dataset ID navigate to the dataset in your web browser and inspect the address bar. The dataset ID can be found at end of the URL in the form (xxxx-xxxx). For example, for the following URL to a dataset:
https://data.seattle.gov/Public-Safety/Fire-911/m985-ywaw
The dataset ID is: m985-ywaw
Columns within a dataset have both a display name and an API field name. Datasync only operates using API field names. If using the DataSync GUI, you can get the list of API field names by clicking the ‘Get Column IDs’ button after entering the Dataset ID. You may also view the API field names from your browser, by hovering over the information icon on any column.
Datasync supports the Text, Formatted Text, Number, Money, Percent, Date & Time (with or without timezone), Location, Website URL, Email, Checkbox, Flag, Star and Phone datatypes. Please refer to the conditions/restrictions resource for formatting requirements of each.
Typically, this is caused for one of three reasons:
To fix the first two bullets, verify that the column names in your control file match the field names in the dataset, and that the list is comprehensive.
To fix the latter, either remove the column from your dataset, or use the ignoreColumns option found in the control file guide.
When all fields are not explicitly specified for the location column, the system will attempt to guess at the components by parsing the address. While this parsing normally works there are notable places where it will fail. To work around this, we recommend explicitly breaking out your address locations into consistuent columns (e.g. address, city, state, zip) and then passing those directly to the synthetic location.
Your version of Java is too old, you should update to at least Java 7. Get the latest version of Java here.
Your header row containing the column names in the dataset does not exactly match the column names in the dataset. Note that the column names are case sensitive. It is best to use the column identifiers (a.k.a. API field names) in your header row, which can be easily obtained for a dataset by clicking the “Get Column ID” button within DataSync.
If you receive a SunCertPathBuilderException, there are two typical causes:
Run the following, removing the proxy options if you are not behind a proxy server. You can remove the ‘-rfc’ option to get additional information about each certificate in the chain.
keytool -J-Dhttps.proxyHost=<PROXY_HOST>
-J-Dhttps.proxyPort=<PROXY_PORT>
-printcert -rfc
-sslserver <DOMAIN>:443
<FILENAME>
.cerRun the following, using your keystore password if that has been set up or the default password ‘changeit’ otherwise.
keytool -import -keystore cacerts -file <FILENAME>.cer
This error is because the “syntheticLocations” field is in the wrong level of the control file. It needs to be within the “csv” or “tsv” object, since it contains details about how to interpret the CSV or TSV.
This error is most likely caused by insufficient heap space. Try starting up DataSync with additional heap space using one of the options below:
java -jar -Xmx500m <DATASYNC_JAR>
java -jar -Xmx1g <DATASYNC_JAR>
The former allows java to use 500 MB of space and the latter 1 GB of space. If the problem persists please contact your Socrata representive for support.
Redownload the DataSync JAR from: https://github.com/socrata/datasync/releases
This is only possible in DataSync version 1.0 and higher. Refer to this documentation.
Please reference our Network Considerations resource.
Verify that your CSV meets all of the restrictions detailed in the conditions and restrictions guide. If you are still having trouble, please contact your Socrata representive for support.