30 July 2023

Aquia Open Source Contributions - Adding a CISA KEV Enrichment Table to Matano

Principal Security Engineer Dakota Riley writes about contributing CISA KEV Enrichment Tables to Matano

Dakota Riley
Dakota Riley Principal Security Engineer LinkedIn

Open-source tooling is a very important part of the security tooling ecosystem. It enables organizations that can’t otherwise afford a 100k+ enterprise tool licensing to have access to security capabilities. You don’t have to engage in a time-consuming sales cycle to try it. Public docs, public code, public roadmap, and the ability to directly chat with other contributors are strong pros as well. As an engineer, I love contributing back to open-source projects. It enables me to keep my development skills sharp while also giving back to the community and solving real-world problems, all at the same time. In this blog, I talk about my recent contribution to Matano: an enrichment table for the CISA Known Exploited Vulnerabilities list!

What is Matano?

From matano.dev:

Matano is an open-source security data lake platform (SIEM alternative) for Amazon Web Services (AWS). It lets you ingest petabytes of security and log data from various sources, store and query them in a data lake, and create Python detections as code for real-time alerting. Matano is fully serverless and focuses on enabling high-scale, low-cost, and zero-ops, and deploys fully into your AWS account.

Matano is a place to aggregate various security tooling outputs, enrich/normalize them, and alert on/analyze them. Pretty neat! Some high-level facts:

  • Able to manage your detections, schemas, and ingestion/source configurations as code
  • Able to transform logs using Vector Remap Language (VRL), a domain-specific language built for manipulating logs
  • Tables are formatted using Apache Iceberg
  • The out-of-the-box (aka “Managed”) log sources normalize logs to conform with the Elastic Common Schema (ECS)

Specific to the contribution, Matano has a feature called Enrichment tables. Enrichment tables allow you to store useful data within Matano that can be used to literally enrich other log sources. You can utilize enrichment tables in one of three ways:

  • Update a data record in real time upon ingestion
  • Lookup values from the enrichment table from your Python detection logic
  • Query the enrichment table directly

Enrichment tables are incredibly valuable for enriching your data with business-specific context. Business units, AWS account names to account numbers, and user information all make great use cases for an enrichment table. For more standard use cases, Matano comes with built-in enrichment tables that handle the data collection for you. Aside from my recent contribution, integrations for Alienvault OTX and Abuse.ch are present out of box.

What is the CISA Known Exploited Vulnerabilities (KEV) list?

The Cybersecurity and Infrastructure Security Agency (CISA) maintains a catalog of vulnerabilities that are known to have been exploited in the wild. The KEV list is often pointed to as an easy starting point for those looking to prioritize large numbers of vulnerabilities across their organization. For those in the Federal Civilian Executive Branch (FCEB) agencies, you are actually required to remediate KEV list vulnerabilities within certain timeframes. Having the capability to detect if a known exploited vulnerability is present in your environment is a powerful tool for both government and commercial organizations.

My coworker Will Lindsey previously wrote about the KEV list in his article about KEV Bot, Aquia’s KEV notification system hosted on AWS.

Why contribute this?

I’ve written code to pull the KEV list in three different languages at this point. There are several vulnerability scanning tools that don’t natively surface KEV list status, or abstract it away via a proprietary risk score. When I stumbled upon Matano and its enrichment table functionality, I figured having the KEV list as a readily available enrichment would prove handy to other security practitioners. I also think it’s a great example of what’s possible with enrichment tables.

Practicality aside, I have a strong growing interest in the intersection of security and data engineering, and contributing to Matano has been a great opportunity to exercise that! I also like Rust, even if I don’t get to write it very often.

Utilizing the CISA KEV Matano Managed Enrichment Table

While Matano allows you to “bring your own” enrichment tables, meaning that you can just point it to where your data lives and a table is created, the CISA KEV table is a “managed’ enrichment table. This means that all that is needed to enable it is add the following file to your Matano directory:

#enrichment/cisa_kev/enrichment.yml

name: "cisa_kev"
managed:
  type: "cisa_kev"

Once the puller runs (the KEV list is pulled hourly), the data is then available for us to query!

Athena query result of the CISA Kev Table

Querying the table directly

Making queries directly against the enrichment table can be useful for an analyst. Some examples of possible queries are:

Querying for a specific CVE:

SELECT *
FROM "matano"."enrich_cisa_kev_view"
where "vulnerability_id" = 'CVE-2021-27103'

Showing all CISA KEV entries added since a date:

SELECT *
FROM "matano"."enrich_cisa_kev_view"
where cast(cisa_kev_dateAdded as date) > cast('2023-5-05' as date)
SELECT *
FROM "matano"."enrich_cisa_kev_view" a
where contains("vulnerability_category", 'Linux')

Utilizing the table in a detection to alert on the presence of a KEV vulnerability in your environment

Matano allows us to author Python-based detections-as-code and utilize enrichment tables from these. Below is a proof of concept detection that will alert us if we receive a log event from a vulnerability scanner that is on the KEV list. This logic could absolutely be re-applied to any event/finding where a CVE ID is present:

# detect.py

from detection.enrichment import cisa_kev

def detect(e):
  
  cve_id = e.deepget('vulnerability.id')
  return cisa_kev.get(cve_id)

def title(e):
  
  account_id = e.deepget('cloud.account.id')
  cve_name = e.deepget('vulnerability.name')
  return fCISA Known Exploited Vulnerability present - {cve_name} - AWS Account ID: {account_id}

Conclusion

My ultimate goal of working on these contributions is to keep learning! A few of the things I learned while working on this contribution, in no particular order:

  • Normalized schemas are awesome! When I was working through writing the transformation for the raw data to conform to ECS, I wondered if the extra effort would be worth it. Once I started playing with the enrichment table in my lab, being able to query similar attributes across different log sources (vulnerability.id and cloud.account.id are two really strong examples) was really intuitive.
  • Vector Remap Language is a super powerful tool for transforming data without slinging Python. It has plenty of built in functions, and feels safer than using raw Python to perform data transformations. I made pretty heavy use of the VRL Playground to test the code out during. Very surprised that this isn’t more commonly utilized across the industry for security data use cases.
  • The behaviors of the Rust CSV crate, and also how much I have to learn about Rust. I love the efficiency it brings for this use case however!

Thanks for reading!

If you have any questions, or would like to discuss this topic in more detail, feel free to contact us and we would be happy to schedule some time to chat about how Aquia can help you and your organization.

Categories

AWS Security Open-Source