AWS Integration with Epicollect5

grayson · 27 May 2024 15:53

Hello

I am currently working on Epicollect5 for data collection and am interested in integrating it with AWS for enhanced data management and analysis capabilities.
I am excited about the feasibility and best practices for integrating Epicollect5 data collection with AWS services such as Amazon S3 for storage, AWS Lambda for serverless computing, and Amazon Athena for querying data in S3 using standard SQL. I have already referred Epicollect5 Data Collection User Guide aws developer resources .

Anyone in the community had experience with integrating Epicollect5 with AWS services for data collection and analysis?

I am interested to know about any available resources, or documentation that helps me integrating Epicollect5 with AWS as a developer.

Thank you in advance .

Best regards,
grayson

Epicollect5 · 28 May 2024 09:16

1. Access Epicollect5 API with Bearer Token

You need to obtain a Bearer token from Epicollect5 to access its API. Typically, this token is generated and provided via the Epicollect5 web interface or API. The token is not needed when a project is public.

See →

2. Setting Up AWS Lambda

Steps:

Create a Lambda Function (if not already created):
- Go to the AWS Lambda console.
- Click on “Create function”.
- Choose “Author from scratch”.
- Name your function (e.g., EpicollectDataFetcher).
- Choose a runtime (e.g., Python 3.x or Node.js).

Write the Lambda Code with Bearer Token:
The function will make API requests to Epicollect5 and store the data in an S3 bucket.

Example Python code for the Lambda function with a Bearer token:

import json
import requests
import boto3
import os

def lambda_handler(event, context):
    project_slug = os.environ['EPICOLLECT_PROJECT_SLUG']
    bearer_token = os.environ['EPICOLLECT_BEARER_TOKEN']
    api_url = f"https://five.epicollect.net/api/export/entries/{project_slug}"
    
    headers = {
        'Authorization': f'Bearer {bearer_token}',
        'Accept': 'application/json'
    }
    
    response = requests.get(api_url, headers=headers)
    data = response.json()
    
    s3 = boto3.client('s3')
    bucket_name = os.environ['S3_BUCKET_NAME']
    file_name = 'epicollect_data.json'
    
    s3.put_object(Bucket=bucket_name, Key=file_name, Body=json.dumps(data))
    
    return {
        'statusCode': 200,
        'body': json.dumps('Data fetched and stored successfully')
    }

Or using Node.js

const https = require('https');
const AWS = require('aws-sdk');

exports.handler = async (event) => {
    const projectSlug = process.env.EPICOLLECT_PROJECT_SLUG;
    const bearerToken = process.env.EPICOLLECT_BEARER_TOKEN;
    const apiUrl = `https://five.epicollect.net/api/export/entries/${projectSlug}`;
    const s3 = new AWS.S3();
    const bucketName = process.env.S3_BUCKET_NAME;
    const fileName = 'epicollect_data.json';

    const getApiData = () => {
        return new Promise((resolve, reject) => {
            const options = {
                headers: {
                    'Authorization': `Bearer ${bearerToken}`,
                    'Accept': 'application/json'
                }
            };
            
            https.get(apiUrl, options, (res) => {
                let data = '';
                
                res.on('data', (chunk) => {
                    data += chunk;
                });
                
                res.on('end', () => {
                    resolve(JSON.parse(data));
                });
                
            }).on('error', (err) => {
                reject(err);
            });
        });
    };

    try {
        const data = await getApiData();
        const params = {
            Bucket: bucketName,
            Key: fileName,
            Body: JSON.stringify(data)
        };
        
        await s3.putObject(params).promise();
        
        return {
            statusCode: 200,
            body: JSON.stringify('Data fetched and stored successfully')
        };
    } catch (error) {
        console.error('Error fetching data or storing in S3:', error);
        return {
            statusCode: 500,
            body: JSON.stringify('Failed to fetch and store data')
        };
    }
};

Set Environment Variables:
- In the Lambda console, set the following environment variables:
  - EPICOLLECT_PROJECT_SLUG (e.g., your_project_slug)
  - EPICOLLECT_BEARER_TOKEN (your Bearer token from Epicollect5)
  - S3_BUCKET_NAME (your S3 bucket name)
Configure Permissions:
- Ensure your Lambda function has permissions to write to the S3 bucket. Attach an IAM role with the AmazonS3FullAccess policy or a more restrictive policy as needed.

3. Setting Up AWS S3

Create an S3 bucket to store the data fetched from Epicollect5.

Steps:

Create a Bucket:
- Go to the S3 console.
- Click “Create bucket”.
- Provide a unique name for the bucket (e.g., epicollect-data-bucket).
Configure Bucket Permissions:
- Ensure your bucket is configured to allow write access from the Lambda function’s IAM role.

4. Setting Up AWS API Gateway (Optional)

You can set up an API Gateway to trigger your Lambda function via an HTTP request.

Steps:

Create an API:
- Go to the API Gateway console.
- Click “Create API” and choose “HTTP API”.
Configure Routes:
- Create a new route (e.g., GET /fetch-data) that triggers the Lambda function.
Deploy the API:
- Deploy the API to make it available for external calls.
Invoke the API:
- Use the API endpoint to trigger the Lambda function, which will fetch and store data from Epicollect5.

5. Automate Data Fetching (Optional)

You can automate data fetching using CloudWatch Events or EventBridge to trigger the Lambda function at regular intervals.

Steps:

Create a Rule:
- Go to the CloudWatch console.
- Create a new rule for a scheduled event (e.g., every day at midnight).
Configure the Rule:
- Set the rule to trigger your Lambda function.

This integration involves setting up an AWS Lambda function to fetch data from Epicollect5 using its API with a Bearer token and storing the data in an AWS S3 bucket. Optionally, you can set up an API Gateway to trigger this function on-demand and use CloudWatch for scheduled automation.
Ensure you have the necessary permissions and configurations in place for seamless operation.