Customers who are using DataSync via FTP or HTTP can get detailed debugging information by retrieving the logs for each job. Details of how to access those logs can be found below.
Customers can use the UI to get to a jobs status page that will tell them the status of the 250 most current jobs and get detailed information on their success or failure. They get there by navigating to it from the admin panel on thier open data portal instance.
The feilds that are exposed through this are: success/failure, publish method, dataset owner, and a timestamp.
When a job fails you can click on it to get more details and see some error messaging on what may have been wrong on upload. The below is a sample error message for a job that had a record that could not be identified as a date type in one of the columns.
You can use Filezilla or any other FTP client that supports FTPS to connect to the FTP server.
In Filezilla go to File -> Site Manager
.
Set up a new connection with the following details:
<Your Socrata username>
<Your Socrata password>
Ensure the transfer mode is ‘Passive’ by going to:
Transfer Settings -> Transfer mode : Passive
.
Save the connection and press “Connect”.
If you only have permission to one domain, you will be dropped into the directory for that domain. You should see the directories named with the dataset ID (e.g. b2fd-cjk2) of any dataset you have updated using DataSync replace via FTP. If you have permission to multiple domains, you will see them as subdirectories.
Inside each dataset identifier directory there should be the following files/directories:
You can download log.txt to see the logging information for the given dataset. Within the ‘completed’ directory you can find CSVs/TSVs and control.json files archived by date (there are nested folders for year, month, and day). After each successful update operation using DataSync replace via FTP, the CSV/TSV and control.json files that were used to perform the update are archived. Archived files will are stored for 30 days. Contact Socrata support if you would like additonal information about archiving.
You must be signed in with an account with publisher rights before you can view the logs. Logs are offered in a plain text form or in a json format by using either the first or second url in each pair below respectively.
Logs for all DataSync over HTTP jobs run on your domain by visiting:
https://<Your domain>/datasync/log
https://<Your domain>/datasync/log/index.json
DataSync logs for a specific dataset can be found by visiting:
https://<Your domain>/datasync/id/<Your dataset ID>/log/
https://<Your domain>/datasync/id/<Your dataset ID>/log/index.json
The status of an in-progress job on a specific dataset can be found by visiting:
https://<Your domain>/datasync/id/<Your dataset ID>/status/
https://<Your domain>/datasync/id/<Your dataset ID>/status/index.json
DataSync logs for a specific job can be found by visiting
https://<Your domain>/datasync/id/<Your dataset ID>/log/<Your job ID>
https://<Your domain>/datasync/id/<Your dataset ID>/log/<Your job ID>.json
Where
<Your domain>
is your domain, <Your dataset ID>
is the identifier of the dataset and <Your job ID>
is the identifier of the job. The job identifier is a 32-length character string and is included in all of the logs listed above.
Each CSV uploaded through DataSync over HTTP is available for at least 30 days after upload. The CSV can be found by generating a URL with the following pattern
https://<Your domain>/datasync/id/<Your Dataset ID>/completed/yyyy/mm/dd/hh:mm:ss.xxx-<Your file name>.csv
Where <Your domain>
is your domain, <Your dataset ID>
is the identifier of the dataset and <Your file name>
is the name of the CSV you uploaded.
The date information within the path can be found by navigating the /completed/ directory. For example, to see which months still contain snapshots, you can use:
https://<Your domain>/datasync/id/<Your Dataset ID>/completed/yyyy/
And to determine the complete file name, including the hour, minute, seconds and milliseconds, you can use
https://<Your domain>/datasync/id/<Your Dataset ID>/completed/yyyy/mm/dd/
noting that the resulting file will be url-encoded.