Export data from Red Canary
    • 22 Aug 2024
    • 13 Minutes to read
    • PDF

    Export data from Red Canary

    • PDF

    Article summary

    This guide explains how to use Canary Exporter, a tool that allows you to transfer security data collected by Red Canary to other systems for further analysis.

    What is Canary Exporter good for?

    • Sending raw data to long-term storage (compliance, archival)

    • Sending network connection data to a SIEM for threat analysis

    • Sending process start records for custom analytics

    Customizable Output

    Canary Exporter empowers users to tailor data export destinations based on their specific requirements:

    • Local File Output: Enables storage of exported data on the local system for further processing or analysis.

    • AWS S3 Output: Facilitates integration with cloud-based storage solutions for long-term retention and archival purposes.

    • Other Output Options: The exporter supports additional output methods, such as Kinesis, Azure Blob Storage, and Google Cloud Storage, providing flexibility in data management.

    How does it work?

    1. Red Canary collects telemetry data from your endpoints.

    2. You choose the data format (native, standardized, cf-flattened) and configure Canary Exporter.

      1. Native: Preserves the original format of the telemetry data as collected by the EDR/EPP platform. Ideal for third-party applications expecting data in a specific product format (e.g., VMware Carbon Black EDR, CrowdStrike Falcon).

      2. Standardized: Adheres to Red Canary's uniform data structure, making it easier to parse and analyze, regardless of the underlying product.

      3. CF-Flattened (Linux EDR only): Builds on the standardized format by including additional context fields at the top level of various event types. Provides enhanced data richness for Linux EDR environments.

    Canary Exporter retrieves data from Red Canary and outputs it to your chosen destination (local file, AWS S3, etc.). Canary Exporter is packaged as a Docker container that connects to Red Canary resources in AWS, retrieves your data, and outputs it as files on the system running the exporter.

    mceclip0.png

    Each Exporter deployment can only read from one queue so if a customer wants native and standardized files, they need to have at least two exporters.

    • As telemetry volume increases, customers should increase the number of exporters to keep up with the queue.

    • Running in separate containers

      • You cannot and should not run Exporter within the same container with different config files. This may cause collisions and may result in deleted but unprocessed files. If you’re using two different config files, run two separate containers, each pointing to only one config file.

    • Sending to different resources (Splunk, local file, etc.).

    Note: The telemetry is customizable. For example, you can ask it to export only your network connections.

    You can inject your Red Canary API token. As data comes in, Exporter will hit the portal, grab metadata about that endpoint, merge it into the record, and then let you filter on those. The main reason for doing this is maintainability and better unit testing.

    Prerequisites

    • Docker installed on your system (Community Edition works)

    • Hardware requirement of 4GB Memory

    • Red Canary credentials (obtained from Red Canary)

    • Network connectivity to AWS SQS and S3 services

    • Contact your CMS or account team to enable Canary Exporter

    Step 1: Generate Credentials

    1. In Red Canary, click your profile icon, and then click Canary Exporter.

    2. Select which type of data you’d like to export: native, standardized, or cf-flattened (Linux EDR only).

    3. Click Generate Credentials.

      1. Credentials are organization-wide for the demo subdomain, not bound to your specific user account. You will receive one AWS key pair for your organization, which should be documented and kept as safe as you would any other password. If you lose your key material, generate a new set of credentials immediately.

    Note: Generating new credentials will destroy any previously created credentials.

    Step 2: Create a Configuration File

    1. Create a file named config.yaml with the following details, starting with the template below, and customize it to your needs.

    ---
    customer_identifier: demo
    
    input:
      # You can export either 'native', 'standardized', or 'cf-flattened' (Linux EDR only) data from Red Canary.
      #
      # Native is data formatted as it was received by the EDR/EPP platform. This format is ideal when using
      # third-party applications that expect data from a specific product, such as VMware Carbon Black EDR or
      # CrowdStrike Falcon.
      #
      # Standardized is data formatted according to Red Canary's standardized format. This format tends to
      # be easier to read and parse, and is product-agnostic.
      #
      # CF-flattened is data formatted similarly to Red Canary's standardized format but with
      # additional context fields available in the the top-level record of different event types.
      # This format is only available for Linux EDR customers.
      # Learn more about the available fields at https://rc-customer-tools.s3.us-east-2.amazonaws.com/forwarder/canary_forwarder_flat.schema.json.
      #
      stream_name: <<<<< Enter one of 'native', 'standardized', or 'cf-flattened' (Linux EDR only) >>>>>
    
      aws_region: us-east-2
      aws_access_key_id: <<<<< Enter the access key ID from the previous step >>>>>
      aws_secret_access_key: <<<<< Enter the access key from the previous step >>>>>
    
      #
      # This is optional and filters data in S3 before it is returned to the exporter.
      # Learn more about S3 SELECT syntax at https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference.html.
      #
      # s3_select_where: "S3Object.data.data.entries[0].event_type_full IN ('process_event', 'image_load_event')"
    
    #
    # Optional injection of endpoint metadata into exported records
    #
    # The max number of endpoint metadata records to store in in-memory cache (default: 1000)
    # endpoint_cache_size: 1000
    #
    # Your Red Canary API key used to retrieve endpoint metadata for each exported record
    # red_canary_authentication_token: <<<<< Enter your Red Canary API key >>>>>
    
    outputs:
    #  - file:
    #      directory: output/procstarts
    #      max_size_bytes: 10000000
    #      max_files: 2
    #      export_only_where:
    #        event_type_cd:
    #          - process_start
    #        sensor_id:
    #          - 529
    
    #  - kinesis:
    #      stream_name: <<<<>>>>
    #      max_records_to_buffer_before_sending: 500
    #      aws_access_key_id: <<<<>>>>
    #      aws_secret_access_key: <<<<>>>>
    #      aws_region: <<<<>>>>
    #      export_only_where:
    #        event_type_cd:
    #          - network_connection
    #        sensor_id:
    #          - 529
    
    #  - s3:
    #      bucket_name: <<<<>>>>
    #      object_key_prefix: <<<<>>>>
    #      aws_access_key_id: <<<<>>>>
    #      aws_secret_access_key: <<<<>>>>
    #      aws_region: <<<<>>>>
    #      max_size_bytes: 10000000
    #      compression_format: either 'none' or 'gzip'
    #      export_only_where:
    #        event_type_cd:
    #          - network_connection
    #        sensor_id:
    #          - 529
    
    #  - azure_blob:
    #      storage_account_name: <<<<>>>>
    #      container_name: <<<<>>>>
    #      blob_name_prefix: <<<<>>>>
    #      storage_access_key: <<<<>>>>
    #      max_size_bytes: 10000000
    #      compression_format: either 'none' or 'gzip'
    #      export_only_where:
    #        event_type_cd:
    #          - network_connection
    #        sensor_id:
    #          - 529
    
    #  THIS EXPORTER IS ONLY SUPPORTED WHEN USING THE MRI DOCKERFILE (docker pull redcanary/canary-exporter-ruby-mri)
    #  - google_cloud_storage:
    #      bucket_name: <<<<>>>>
    #      object_key_prefix: <<<<>>>>
    #      credentials_json: |
    #        {
    #          "type": "service_account",
    #          "project_id": "project-id",
    #          "private_key_id": "key-id",
    #          "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
    #          "client_email": "service-account-email",
    #          "client_id": "client-id",
    #          "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    #          "token_uri": "https://accounts.google.com/o/oauth2/token",
    #          "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    #          "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
    #        }
    #      max_size_bytes: 10000000
    #      compression_format: either 'none' or 'gzip'
    #      export_only_where:
    #        event_type_cd:
    #          - network_connection
    #        sensor_id:
    #          - 529
    
    #  THIS EXPORTER IS ONLY SUPPORTED WHEN USING THE MRI DOCKERFILE (docker pull redcanary/canary-exporter-ruby-mri)
    #  - google_cloud_pubsub:
    #      topic_name: <<<<>>>>
    #      credentials_json: |
    #        {
    #          "type": "service_account",
    #          "project_id": "project-id",
    #          "private_key_id": "key-id",
    #          "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
    #          "client_email": "service-account-email",
    #          "client_id": "client-id",
    #          "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    #          "token_uri": "https://accounts.google.com/o/oauth2/token",
    #          "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    #          "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
    #        }
    #      compression_format: either 'none' or 'gzip'
    #      export_only_where:
    #        event_type_cd:
    #          - network_connection
    #        sensor_id:
    #          - 529

    Step 3: Run the Exporter inside your network

    Run the Exporter on a system that has high reliability and availability. Use this command as your starting point with an init script that will restart the docker container if it fails.

    Ensure the system running the exporter has network connectivity to Amazon’s SQS and S3 services. S3 could be used for long-term storage for compliance. S3 is a common ingest source for getting data into a SIEM or any sort of login platform.

    docker run -t \
    --volume $HOME/canary-exporter/config.yaml:/config.yaml \
    --volume $HOME/canary-exporter/output:/output \
    redcanary/canary-exporter-ruby 

    Replace redcanary/canary-exporter-ruby with redcanary/canary-exporter-ruby-mri if you are using the google_cloud_pubsub or google_cloud_storage output configurations.

    The following variables can be modified based on your requirements:

    • $HOME/canary-exporter/output defines the local path where data will be downloaded, parsed, and output if a file output is configured. Note that everything after the : on this line is required by the exporter and should not be changed.

    • $HOME/canary-exporter/config.yaml is the path to the configuration file you customized earlier. The data after the : on this line is required by the exporter and should not be changed.

    You can modify the way the container is executed in several ways:

    • Use docker run -it to send standard out to the screen and tie the process to the logged in session.

    • Use docker run -dit to run the container and output in the background and not tied to a session.

    • Use Docker’s —volume /dir:/output:Z option if you find that files are not being written to the host’s output directory as expected. The Docker documentation has a description of these options and how to use them with selinx-enabled host systems.

    Canary Exporter Output

    If you chose Output files they will start to populate the folder specified.

    On the Red Canary side, we can see activity via the SQS queues. To investigate an SQS queue:

    1. Login via Okta using the AWS SSO tile and select Simple Queue Service.

    2. Filter down to the appropriate queue, which starts with canaryexporter-prod-rc followed by either -native or -standard, file-added, and finally their subdomain.  
      For instance, canaryexporter-prod-rc-native-file-added-rcsupport.

    3. In the list of queues, you can see the number of Messages available as well as in flight for each queue. If 0, all files have been processed (exported).

    When the Canary Exporter is enabled, several resources are provisioned

    • IAM user: rc-customer-applications-<customer-shortname>

      • Customer shortname is stored on the subdomain model (Subdomain#customer_shortname) and can be shared across subdomains. This means all external services across multiple subdomains for a customer will be funneled together. For example, a customer has subdomain1 and subdomain2 each with 2 external services and the same shortname. There will only be one native and one standardized queue for exporter(s) to poll.

      • A customer can (re)generate credentials by visiting the Canary Exporter page in portal

    • SNS topic: rc-[native|standardizer|cf-flattened]-file-added-<external-service-namespace>

      • For each external service, at least 2 telemetry types will be available: native and standardized. If the external services are for Linux EDR, a third cf-flattened will be available.

      • See Canary Exporter data flow for more details on data flow

    • SQS queue: canaryexporter-[native|standardized|cf-flattened][-test]?-file-added-<customer-shortname>

      • This queue subscribes to each of the SNS topics per the telemetry type, i.e. the native queue subscribes to the native topics, the standardized queue subscribes to the standardizer topics.

      • If the customer has opted into a test queue via Subdomain#canary_exporter_enable_test_queue, [native|standardized|cf-flattened]-test options will also be available so a customer can test changes to the configuration without impacting existing exporters. This option creates additional queues, but does not create additional topics.

    Additional Notes

    • You cannot run multiple exporters with different configurations on the same queue. This will lead to inconsistent results.

    • For high data volume environments, run multiple exporters with the same configuration to handle the load.

    • The exporter expects to be restarted by a supervising system (supervisor, upstart, init) in case of errors.

    • Tools like jq can be used to explore exported JSON data.

    FAQ

    Can we use the Canary Exporter credentials to create our own script to collect the data from our subdomain?

    The customer is allowed to use their own code to pull the data from their subdomain and not use Canary Exporter. The only caveat is that the customer is responsible for the code and Red Canary’s  support will be limited as you will be using a customized solution for the data transfer.

    Can Canary Exporter be hosted by EKS, Fargate, ECS or others?

    Canary Exporter supports output to S3 and more output options than just writing local files. Canary Exporter does not yet support loading credentials from a secret store using IAM roles or keys.

    Can Canary Exporter Input Location be Changed to another AWS region?  

    No, we will not change the input location to another region in AWS. However, you can set up a bucket in us-east-2 and mirror the data to us-east-1 in your account, which may alleviate the need to use the NAT gateway.

    Note: Do not confuse this with the output section of the config.yaml file, where the customer can use AWS and configure a region of their choosing.

    Canary Exporter deployment concerns

    • Each exporter deployment can only read from one queue so if a customer wants native and standardized files, they need to have at least 2 exporters.

    • As telemetry volume increases, customers should increase the number of exporters to keep up with  the queue.

    • A customer can leverage S3Select to filter out unwanted files.

    Why is an external service missing from the exporter for a subdomain with multiple external services?

    There are two possible scenarios:

    • The external service was likely created before May 31, 2022 and will need to be manually provisioned. This can be done through the production console by executing Subdomains::TerraformCanaryExporter.call(subdomain: Subdomain.current.id)

    Exploring exported data

    A number of tools exist that are designed for exploring large amounts of JSON formatted data. jq is especially helpful:

    Counting the types of events in exported standardized data

    jq .event_type_cd *.json | sort | uniq -c

    Listing the sensor ids with data present in standardized data

    jq .sensor_id *.json | uniq

    How to Filter for Specific Event Types

    The configuration of this section depends on whether you are using “standardized” or “native” format.

    • If you choose native format you may want to filter for specific Event Types. To do this, use the EDR products (i.e., Carbon Black, CrowdStrike, etc.) Event Type filters schema.  

    • The “native” events stream can be classified as:  Raw Endpoint Events (i.e., Sensor generated event telemetry)

    • The Carbon Black EDR Event Types filters can be found here. See the article section “Raw endpoint events” for all Event Types filters. The level of filtering is limited to these broader categories listed above.

    • If you choose standardized format you will need to use the Canary Exporter Event Types filters. These are:

      • binary

      • child_process

      • endpoint

      • file_creation

      • file_deletion

      • file_modification

      • model_attributes

      • network_connection

      • process_handle_open

      • process_thread_open

      • registry_key_deletion

      • registry_value_deletion

      • registry_value_write

      • registry_key_creation

      • remote_thread_creation

      • module_load

      • process_end

      • process_start

    • If you use native format, the outputs and event type section should look similar to the block below.

    Note: If you use Linux EDR, the available event type filters can be found here: Filtering Telemetry for Linux EDR.

    Notice the “/” in front of the “output” directory name below. This ensures log data is saved to your local log directory. If the "/" is placed after the output directory name, this will cause Canary Exporter to save logs to the /apps/output directory inside the Docker image).

    outputs:
       - file: 
           directory: /output
           max_size_bytes: 10000000
           max_files: 4
           export_only_where:
             type:
               - ingress.event.regmod
               - ingress.event.filemod
               - ingress.event.netconn
               - ingress.event.procstart
               - ingress.event.module
               - ingress.event.childproc
               - ingress.event.crossprocopen

    Note: If you comment out the “export_only_where” and below sections or you do not specify a “type,” Canary Exporter will send all Event Types.

    If you use the standardized format, the outputs and event type section should look similar to this:  

    outputs:
       - file:
           directory: /output
           max_size_bytes: 10000000
           max_files: 2
           export_only_where:
             event_type_cd:
              - process_start
              - child_process
              - binary

    Note: If you comment out the “export_only_where” and below sections or you do not specify a “type,” Canary Exporter will send all Event Types.


    Was this article helpful?

    What's Next