Photos file names changed during upload

rpoloni · 29 January 2025 14:28

Project Name (if applicable, otherwise just type n/a)

N/A

Question

Hi guys, thank you first of all for this great application for collecting data in the field. I adapted an R script I had previously, which allowed me to process samples collected in the field, to use epicollect.
I have very simple projects: per each field record I have only three fields: field code (mandatory), place (locality or city) and photo. In the photo there is all I need: the date, the gps coordinates and the photo itself needed to identify the species). This is very convenient because I usually avoid to write down too many things in the field, because you are always in a rush. Then, with the script, I complete the table from epicollect with the rest (coordinates and date) and process the data. However, when I download my data the photo name is changed, so I cannot automatically take it from a folder using the table exported from the epicollect project.
So, the question is: why the name of the photo is changed compared to the one given by my phone? I can of course download all the pictures from epicollect project, but since there is no bulk download and I have everything already in my google drive, it would be much more convenient if the names were not changed.

Epicollect5 · 29 January 2025 15:52

Renaming file uploads is an important practice that aligns with standard web security and data management best practices.

Ensuring Uniqueness
- When multiple users upload files, having unique filenames prevents accidental overwriting. If files retained their original names, a second upload with the same name could replace an existing file, potentially leading to data loss or confusion. By renaming files, we guarantee that each upload remains distinct and accessible.
Enhancing Security
- Allowing users to upload files with arbitrary names poses security risks. Certain filenames or extensions might be used maliciously, such as executable scripts disguised as harmless files. Renaming the files mitigates this risk by ensuring a controlled and safe naming convention.
Maintaining Consistency and Organization
- Standardized filenames make it easier to manage, retrieve, and process files efficiently. A structured naming approach ensures compatibility across different systems and prevents potential conflicts with special characters or spaces that could cause issues.
Providing a Reference to the Original Name
- If retaining the original filename is important for your workflow, a simple solution is to add an extra question to your form, such as “Photo Filename,” where users can manually enter the desired filename. This allows you to map the original name to the filename assigned by Epicollect5 while still benefiting from the security and organizational advantages of renaming.

Also, be aware that file renaming happens at the device level, before the upload.
Files created (or imported) in Epicollect5 use the following name convention

{entry uuid}_{Unix timestamp in seconds}.{ext}

rpoloni · 29 January 2025 17:03

Hi, thank you for the answer. I see the interest of changing the file name, but it is still really unconvenient for what I have to do. Entering the photo name is not feasible because it is usually very long. I will think about it. The bulk download of pictures could actually solve the issue pretty straightforward!

rpoloni · 30 January 2025 10:55

Hi, I have tested a few things on my side. Downloading the images one by one is a bit of a problem, because all the images are downloaded with the same name, and the exif are changed so that the date of the photo, gps and everything else is not corresponding any more to the original one. is there a way of solving this issue? I feel is a bit of a pity that the media associated to the data are changed, it can result in a loss of data.
Maybe using the api is another option, but I don’t think it will solve the problem of the exif.

Epicollect5 · 30 January 2025 11:47

The EXIF data are not changed by Epicollect5, just copied over.

To download media files using our API have a read at →

rpoloni · 30 January 2025 12:43

I tried with the API, I provide the code used below. However, the exif are changed, I don’t know why…

epicoll <- read.csv(epi_csv, header = TRUE) #csv file of the form

#download using api
for (i in 1:nrow(epicoll)) {
  photo_name <- epicoll$X3_TUBE_PHOTO[i]
  field_code <- epicoll$X1_FIELD_CODE[i]
  reqst <- paste0("https://five.epicollect.net/api/export/media/", proj_slug, "?type=photo&format=entry_original&name=", photo_name)
  res = GET(reqst, add_headers("Authorization" = paste("Bearer", token)),  write_disk(path = paste0(picture_directory, field_code, ".jpg", sep=""), overwrite = TRUE))
}

Epicollect5 · 30 January 2025 14:44

Below is the API URL to a public photo uploaded to our EC5 Demo Project:

Photo

When we save this photo locally using Chrome and check it against a tool like Jimpl, we receive the metadata correctly.

Or you can analyse the API URL directly using exifinfo

Epicollect5 · 30 January 2025 14:50

Some AI suggestions

Since the API provides the file with EXIF data when accessed directly via the browser, but the EXIF is missing when downloaded via your R script, the issue is likely caused by how httr::GET() processes the request.

Possible Causes & Fixes:

1. Ensure `write_disk()` is Writing Binary Data Correctly

When using httr::GET() with write_disk(), the API response should be saved as a binary file. Try explicitly setting mode = "wb" (write binary) to ensure correct file handling:

library(httr)

for (i in 1:nrow(epicoll)) {
  photo_name <- epicoll$X3_TUBE_PHOTO[i]
  field_code <- epicoll$X1_FIELD_CODE[i]
  reqst <- paste0("https://five.epicollect.net/api/export/media/", proj_slug, "?type=photo&format=entry_original&name=", photo_name)
  
  res <- GET(reqst, 
             add_headers("Authorization" = paste("Bearer", token)),  
             write_disk(path = paste0(picture_directory, field_code, ".jpg"), overwrite = TRUE), 
             config = list(encoding = "gzip"))  # Ensure proper encoding handling
}

✅ Why? Some file functions in R default to text mode, which may modify the binary structure.

2. Verify File Integrity After Download

Check if the downloaded file is corrupted or altered by comparing checksums:

system(paste("md5sum", paste0(picture_directory, field_code, ".jpg")))

Then compare it to the MD5 of the file downloaded manually via the browser.

✅ Why? If the checksums don’t match, something is modifying the file during the API download process.

3. Try `download.file()` Instead of `httr::GET()`

If httr::GET() is altering the file, use download.file() instead:

for (i in 1:nrow(epicoll)) {
  photo_name <- epicoll$X3_TUBE_PHOTO[i]
  field_code <- epicoll$X1_FIELD_CODE[i]
  reqst <- paste0("https://five.epicollect.net/api/export/media/", proj_slug, "?type=photo&format=entry_original&name=", photo_name)
  
  download.file(reqst, destfile = paste0(picture_directory, field_code, ".jpg"), mode = "wb", quiet = FALSE)
}

✅ Why? download.file() directly fetches the file and ensures binary integrity.

4. Debug the API Response

To confirm that the API returns a valid image in your script, inspect the response:

res <- GET(reqst, add_headers("Authorization" = paste("Bearer", token)))

# Check content type
print(headers(res)Since the API provides the file with EXIF data when accessed directly via the browser, but the EXIF is missing when downloaded via your R script, the issue is likely caused by how `httr::GET()` processes the request.

### **Possible Causes & Fixes:**

#### **1. Ensure `write_disk()` is Writing Binary Data Correctly**

When using `httr::GET()` with `write_disk()`, the API response should be saved as a binary file. Try explicitly setting `mode = "wb"` (write binary) to ensure correct file handling:

library(httr)

for (i in 1:nrow(epicoll)) {
photo_name ← epicoll$X3_TUBE_PHOTO[i]
field_code ← epicoll$X1_FIELD_CODE[i]
reqst ← paste0(“Epicollect5 - Free and easy-to-use mobile data-gathering platform.”, proj_slug, “?type=photo&format=entry_original&name=”, photo_name)

res ← GET(reqst,
add_headers(“Authorization” = paste(“Bearer”, token)),
write_disk(path = paste0(picture_directory, field_code, “.jpg”), overwrite = TRUE),
config = list(encoding = “gzip”)) # Ensure proper encoding handling
}


✅ **Why?** Some file functions in R default to text mode, which may modify the binary structure.

---

#### **2. Verify File Integrity After Download**

Check if the downloaded file is corrupted or altered by comparing checksums:

system(paste(“md5sum”, paste0(picture_directory, field_code, “.jpg”)))


Then compare it to the MD5 of the file downloaded manually via the browser.

✅ **Why?** If the checksums don’t match, something is modifying the file during the API download process.

---

#### **3. Try `download.file()` Instead of `httr::GET()`**

If `httr::GET()` is altering the file, use `download.file()` instead:

for (i in 1:nrow(epicoll)) {
photo_name ← epicoll$X3_TUBE_PHOTO[i]
field_code ← epicoll$X1_FIELD_CODE[i]
reqst ← paste0(“Epicollect5 - Free and easy-to-use mobile data-gathering platform.”, proj_slug, “?type=photo&format=entry_original&name=”, photo_name)

download.file(reqst, destfile = paste0(picture_directory, field_code, “.jpg”), mode = “wb”, quiet = FALSE)
}


✅ **Why?** `download.file()` directly fetches the file and ensures binary integrity.

---

#### **4. Debug the API Response**

To confirm that the API returns a valid image in your script, inspect the response:

content-type`)

Check response status

print(status_code(res))


If the `content-type` is not `"image/jpeg"`, something might be altering the response.

---

### **Summary**

✅ Try **adding `mode = "wb"` in `write_disk()`**
✅ Compare **file checksums before and after downloading**
✅ **Use `download.file()` instead of `httr::GET()`**
✅ **Inspect API response headers**

rpoloni · 30 January 2025 16:51

Hi, thank you very much for your help! I understand where the problem is.
I think that it is not an api problem: if I use my script to download the image you provided, the metadata are preserved. It is a problem of how the file it is uploaded in the app with the phone. Upload with the phone = metadata changed, upload of the same picture first on google drive, then downloaded and uploaded with the computer in the form = metadata kept invariate. I feel it is a problem of how the phone deals with geoprivacy: probably the metadata are changed when uploading to an external app to protect the user. I will see with google developer community how this can be solved.