I am using a public project that has a vast dataset (c12k entries). I would like to use this data dynamically, so thought that using API calls would be better, from within some R code.
I have managed to download some data and worked out how to deal with pagination, but I am still only allowed c.3k entries with one set of requests.
So, my next plan was occasionally to download the data as CSV and then append new data which I pull from the project through an API query. In order to do this, however, I need to filer the responses I get back from the API call based upon only those entries created since the last entry in my occasionally updated CSV record.
I can query API calls based upon ‘title’, which gives me only those entries posted by a specific user (as users appear to duplicate their names here), by adding query=list(title=" John Smith") to the end of the URL I call. But, when I come to do the same thing for ‘created_at’ I come across two issues:
I cannot filter on ‘created_at’. Even given a SPECIFIC datetime, from an already existing entry, I cannot isolate that record
I cannot for the life of me find the syntax, either in R or HTML, to create a URL that limits calls between two dates.
You can currently filter by the columns created_at and uploaded_at . You may choose a DATE value (in ISO 8601 format, like 2022-01-26T00:00:00.000 ) on which to filter: "filter_from " (all the entries from a value), "filter_to " (all the values to a value) or "filter_from " and "filter_to " (all the entries between two values).
For example, this endpoint will give you only entries created today (8th of April 2022) https://five.epicollect.net/api/export/entries/ec5-demo-project?filter_by=created_at&filter_from=2022-04-08T00:00:00&filter_to=2022-04-08T23:59:59
For extra help, you could ask on this specific R thread →