Filtering API calls between dates in R

Hi everyone.

Very new to this and teaching myself.

I am using a public project that has a vast dataset (c12k entries). I would like to use this data dynamically, so thought that using API calls would be better, from within some R code.

I have managed to download some data and worked out how to deal with pagination, but I am still only allowed c.3k entries with one set of requests.

So, my next plan was occasionally to download the data as CSV and then append new data which I pull from the project through an API query. In order to do this, however, I need to filer the responses I get back from the API call based upon only those entries created since the last entry in my occasionally updated CSV record.

I can query API calls based upon ‘title’, which gives me only those entries posted by a specific user (as users appear to duplicate their names here), by adding query=list(title=" John Smith") to the end of the URL I call. But, when I come to do the same thing for ‘created_at’ I come across two issues:

  1. I cannot filter on ‘created_at’. Even given a SPECIFIC datetime, from an already existing entry, I cannot isolate that record
  2. I cannot for the life of me find the syntax, either in R or HTML, to create a URL that limits calls between two dates.

Does anyone have any pointers they can give me?

Cheers

Paul

Have you tried filtering on dates?

You can currently filter by the columns created_at and uploaded_at . You may choose a DATE value (in ISO 8601 format, like 2022-01-26T00:00:00.000 ) on which to filter: "filter_from " (all the entries from a value), "filter_to " (all the values to a value) or "filter_from " and "filter_to " (all the entries between two values).

For example, this endpoint will give you only entries created today (8th of April 2022)
https://five.epicollect.net/api/export/entries/ec5-demo-project?filter_by=created_at&filter_from=2022-04-08T00:00:00&filter_to=2022-04-08T23:59:59

For extra help, you could ask on this specific R thread →

Yes, I saw this…so I wonder whether I am not constructing the query correctly, then?

So, I know that this call gives me data:
https://five.epicollect.net/api/export/entries/litter-champions?form_ref=14daa35e97cc4173aab567da7caf1cd4_5d11dc5ead846

So, what is the syntax of the query? Does it go:

?filter_from(created_at=2022-04-08T10:53:28.994Z)

Building that in R doesn’t seem to work and running it in a browser says that the form does not exist

To filter by the 8th of April 2022 on your project:

https://five.epicollect.net/api/export/entries/litter-champions?form_ref=14daa35e97cc4173aab567da7caf1cd4_5d11dc5ead846&filter_by=created_at&filter_from=2022-04-08T00:00:00&filter_to=2022-04-08T23:59:59

1 Like

On top of that, since you have only one form, passing the form_ref is redundant so you could just use:

https://five.epicollect.net/api/export/entries/litter-champions?filter_by=created_at&filter_from=2022-04-08T00:00:00&filter_to=2022-04-08T23:59:59

On how to use URL params, this might be useful →
https://www.semrush.com/blog/url-parameters/

1 Like

Yes, I suspected as much…I had already taken that out in my new code.

Your query works a treat, thank you very much…my unfamiliarity with this side of coding got the better of me. I will have a more sleep filled weekend.

From here, time to learn about Shiny and Dashboards!

1 Like