Skip to content

Vulnerability Consumer #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 76 commits into from
Dec 10, 2020
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
cecebce
Dummy Kafka Producer and Thread Updater
elanzini Apr 30, 2020
845c3c9
Utils Parsing Vulnerabilities
elanzini May 2, 2020
cf05ba3
Vulnerability Publisher Implementation
elanzini May 2, 2020
8517c11
Retrieval of vulnerable_files
elanzini May 3, 2020
8c04ba2
Switched to JSONObject instead of raw Strings
elanzini May 4, 2020
d7ff081
Dependency Injection refactoring for Testing
elanzini May 4, 2020
0d3de55
Thread Updater Functionality
elanzini May 5, 2020
25eb697
Commit Parser from Github API
elanzini May 11, 2020
43cdc87
Merge remote-tracking branch 'origin/master' into vulnerability-plugin
elanzini May 18, 2020
c3d0076
Refactoring Plugin to new Server Architecture
elanzini May 18, 2020
9382c51
Mongo Connector + Github API support
elanzini May 24, 2020
6aec304
Parser Test + License information
elanzini May 27, 2020
2080b65
Git PRs and Issues support
elanzini May 28, 2020
661a206
Integration Testing Plugin
elanzini Jun 2, 2020
fb206b0
Support to include https://arxiv.org/abs/1902.02595
elanzini Jun 10, 2020
5131aa1
Parsing for CPE mappers + GH API v4
elanzini Jun 25, 2020
15e2d3c
JSON Mapper GH v4
elanzini Jun 28, 2020
f6619a1
Refactor Parsers + Mappers
elanzini Jun 30, 2020
4c9c4da
ParserManager + Licenses files
elanzini Jun 30, 2020
015219c
Testing Parser Manager Merging
elanzini Jul 1, 2020
a6a25c7
Integration Testing YAML Parser
elanzini Jul 1, 2020
e02fc56
Switch to JUnit 5 + Inject Version Farmer
elanzini Jul 2, 2020
8783e89
Integration with Nitrite for resilience
elanzini Jul 2, 2020
102ff39
Serialization Objects + pom.xml to run jUnit 5
elanzini Jul 2, 2020
c28ecca
Documentation + purl-spec
elanzini Jul 3, 2020
abed1da
Merge remote-tracking branch 'origin/master' into vulnerability-plugin
elanzini Jul 3, 2020
c828ff3
Integration Testing Nitrite + Logger + Thread Safe tests
elanzini Jul 3, 2020
bf3f18f
Merge remote-tracking branch 'origin/master' into vulnerability-plugin
elanzini Aug 10, 2020
6e4e1bf
Update analyzer pom.xml
elanzini Aug 12, 2020
06f1832
Add MSR2020 CPP Dataset to sources of info
elanzini Aug 12, 2020
19287b5
Add readme template
ilyagrishkov Aug 12, 2020
14a05f3
OVALParser logic + VersionRanger for Debian
elanzini Aug 22, 2020
0c3add9
Switch to MR - JSON source for Debian versions
elanzini Aug 22, 2020
57e9f28
Unit testing OVAL Parser
elanzini Aug 23, 2020
8588bb3
Refactor Merger + OVAL Integration
elanzini Aug 23, 2020
428fd3b
PURL to Object conversion
elanzini Aug 27, 2020
c35a8f8
Helper functions to inject in the DB
elanzini Aug 30, 2020
0383af6
Injection callable + appending JSONB to metadata
elanzini Aug 31, 2020
3cb4454
Include option to pass JSON with data
elanzini Sep 1, 2020
2688dd2
Multiple modules fix + data + instructions demo
elanzini Sep 2, 2020
6842552
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Sep 2, 2020
1a21ff3
Fix dependency issue of vulnerability analyzer when running its stand…
mir-am Sep 2, 2020
0e174ce
Update architecture info + demo info callables
elanzini Sep 4, 2020
75f25ca
SQL procedure + update data.json
elanzini Sep 5, 2020
060e0c2
Multiple callables handler + fasten_uri of callables
elanzini Sep 6, 2020
667cd75
Move injection logic to core
elanzini Sep 8, 2020
64c2303
Revert logic to plugin for maintainability
elanzini Sep 9, 2020
abfa757
Refactor Vulnerability + Patch classes
elanzini Sep 9, 2020
aec4794
Procedure checks for internal calls + pkg metadata
elanzini Sep 9, 2020
9430f16
Include injection in the pipeline
elanzini Sep 10, 2020
d4bd010
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Sep 10, 2020
d1c61fc
Integration with server
elanzini Sep 10, 2020
f9b6bca
Separate Producer and Consumer
elanzini Sep 16, 2020
a701101
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Sep 17, 2020
b2bac61
Updating README + additional logger info
elanzini Sep 17, 2020
74b3ddb
Improvement Patch Farmer
elanzini Sep 18, 2020
c7c657f
Moved vulnerability-producer to a separate repo
elanzini Sep 29, 2020
b01d760
Fix dependencies vulnerability-consumer
elanzini Sep 29, 2020
55620a3
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Nov 27, 2020
0734939
update logic to handle multiple DBs
elanzini Nov 27, 2020
27a97fc
refactor to optimize JOOQ + switch to Jackson
elanzini Nov 30, 2020
0f02504
injection with JOOQ + serialization with fasten_uris
elanzini Dec 1, 2020
924a2b9
integration testing + jooq fasten_uri
elanzini Dec 2, 2020
dbd9bbe
Merge branch 'develop' into vulnerability-plugin
elanzini Dec 2, 2020
ed6ab65
filtering modules on pkg_id + add patch_date field
elanzini Dec 4, 2020
28ceec7
check first patched version + handle different base PURLs
elanzini Dec 7, 2020
e9fefca
save vulnerability to system + produce
elanzini Dec 7, 2020
a8ebc18
include purge option + write data to output_path
elanzini Dec 8, 2020
f01268d
change logic to always publish
elanzini Dec 9, 2020
03ef95b
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Dec 9, 2020
a9d746c
write full_fasten_uri in the vuln
elanzini Dec 9, 2020
ba95ed0
remove unused class
elanzini Dec 9, 2020
398851d
clean outdated producer dockerfile
elanzini Dec 10, 2020
bd33561
support different coordinateSeparators
elanzini Dec 10, 2020
7189301
Merge remote-tracking branch 'origin/develop' into vulnerability-plugin
elanzini Dec 10, 2020
fc03b48
use PostgresConnector from core
elanzini Dec 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions analyzer/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
<module>metadata-plugin</module>
<module>graph-plugin</module>
<module>pom-analyzer</module>
<module>vulnerability-plugin</module>
<module>repo-cloner-plugin</module>
</modules>

Expand Down Expand Up @@ -57,5 +58,20 @@
</plugin>
</plugins>
</build>
<!-- Jacoco reporting -->
<reporting>
<plugins>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.8.2</version>
<configuration>
<excludes>
<exclude>**/Main.*</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</reporting>

</project>
26 changes: 26 additions & 0 deletions analyzer/vulnerability-plugin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<p align="center">
<img src="https://user-images.githubusercontent.com/45048351/90056609-d2c67900-dce7-11ea-9f66-3717998d861d.jpg">
</p>
<br/>
<p align="center">
<a href="https://github.com/fasten-project/fasten/actions" alt="GitHub Workflow Status">
<img src="https://img.shields.io/github/workflow/status/fasten-project/fasten/Java%20CI?logo=GitHub%20Actions&logoColor=white&style=for-the-badge" /></a>
<!-- Here should be a link to Maven repo and version should be pulled from there. -->
<a href="https://github.com/fasten-project/fasten/" alt="GitHub Workflow Status">
<img src="https://img.shields.io/maven-central/v/fasten/vulnerability?label=version&logo=Apache%20Maven&style=for-the-badge" /></a>
</p>
<br/>

Description goes here

## Join the community

The FASTEN software package management efficiency relies on an open community contributing to open technologies. Related research projects, R&D engineers, early users and open source contributors are welcome to join the [FASTEN community](https://www.fasten-project.eu/view/Main/Community), to try the tools, to participate in physical and remote worshops and to share our efforts using the project [community page](https://www.fasten-project.eu/view/Main/Community) and the social media buttons below.
<p>
<a href="http://www.twitter.com/FastenProject" alt="Fasten Twitter">
<img src="https://img.shields.io/badge/%20-Twitter-%231DA1F2?logo=Twitter&style=for-the-badge&logoColor=white" /></a>
<a href="http://www.slideshare.net/FastenProject" alt="GitHub Workflow Status">
<img src="https://img.shields.io/badge/%20-SlideShare-%230077B5?logo=slideshare&style=for-the-badge&logoColor=white" /></a>
<a href="http://www.linkedin.com/groups?gid=12172959" alt="Gitter">
<img src="https://img.shields.io/badge/%20-LinkedIn-%232867B2?logo=linkedin&style=for-the-badge&logoColor=white" /></a>
</p>
158 changes: 158 additions & 0 deletions analyzer/vulnerability-plugin/docs/documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Security Plugin
The following documentation aims to offer an overview of the implementation of the **Security** Plugin. Looking at the bigger picture, the goal of this plugin is to gather information related to vulnerabilities in order to enrich the Knowledge Base at the callable-level detail. In a perfect scenario, for each vulnerability the plugin would point to the specific callable that was responsible for the security flaw.

## Architecture
![Image of Architecture](./imgs/plugin_architecture.png)

### Parsers
The `ParserManager` class contains and handles data inputs from all different parsers implemented. Every Parser Object (`NVDParser`, `GHParser` and `ExtraParser`) pulls information from (a) differente source(s) and is also capable of retrieving updates from the same source(s). Each Object implements the following Interface:

```java
public interface VulnerabilityParser {
// Method to retrieve existing vulnerabilities
HashMap<String, Vulnerability> getVulnerabilities();

// Method to retrieve updated and new vulnerabilities
HashMap<String, Vulnerability> getUpdates();
}
```
The `ParserManager` first calls `getVulnerabilities` from each Parser and then aggregates all the information in a commmon format that is passed down the pipeline for more data enrichment. The other method implemented by each Parser is `getUpdates`, which will be called daily in order to aggregate new information from each source. This makes the process of adding new Parsers from new sources of information easier.

### Patches
In order to find where specifically the vulnerability lies in a package, `patch` links allow to retrieve information regarding _what was changed in order to patch the vulnerability_. Combined with some heuristics, this allows to drill down the specific `callables` that were patched.

The `PatchFarmer` receives a list of references contained in each Vulnerability Object and handles each of them in order to figure out if it's possible to extract some **patch diffs**. The following is a list of the sources of information handled by the class:

- GitHub Commits ([example](https://github.com/python/cpython/commit/fbf648ebba32bbc5aa571a4b09e2062a65fd2492))
- GitHub Pull Requests ([example](https://github.com/omniauth/omniauth-oauth2/pull/25))
- GitHub Issues ([example](https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/931))
- Bugzilla<sup>*</sup> ([example](https://bugzilla.mozilla.org/show_bug.cgi?id=1615315))
- JIRA tickets<sup>*</sup> ([example](https://issues.apache.org/jira/browse/ZOOKEEPER-1045))
- Git Trackers<sup>*</sup> ([example](https://git.kernel.org/pub/scm/git/git.git/diff/?id=3ec804490a265f4c418a321428c12f3f18b7eff5))

<sup>*</sup> _supported in future versions_

Patches from GitHub make up 90% of all references and in order to extract information, two different sources are used.
The first one is ([GHTorrent](https://ghtorrent.org/)), running on TU Delft servers that offers a queriable offline mirror of the data offered through the GitHub REST API. Whenever an instance is not found in GHTorrent, the Github REST API is used directly.

### Nitrite

In order to store results along the way ([Nitrite](https://github.com/nitrite/nitrite-java)) is used, which offers MongoDB like API and supports in-memory and single file based persistent store. All the logic is handled by the `NitriteController` class and the structure of the small DB instance is simple. Two `Object Repositories` are created, one storing `PatchObject` instances and the other `VulnerabilityObject` instances. Both classes are indexed based on a unique id, which is represented by a reference (for patches) and by an id (for vulnerabilities) and makes retrieval faster.

The purpose of using Nitrite is two-fold. On one hand, it offers the ability to store the enriched output of the pipeline, making it resilient to failure. At the same time, it allows to be more efficient by checking which references have already been parsed for patches and which vulnerabilities have already been processed and outputted. This is an essential and necessary check the needs to be performed since not all sources of information offer a way to retrieve updates.

### Threads
The `VulnerabilityPlugin` class handles the output of the `ParserManager` and produces to a Kafka topic named `security`. The logic is encapsulated into two threads. The first one is the `ProducerThread` that takes care of parsing the first dump of information from all sources. Once the heavy lifting is done, a new thread is started, the `UpdaterThread` that runs forever, retrieving updates **once per day**.
The frequency of updates differs for each source of information as specified later.

```java
while (true) {
sleep()
queue.addAll(parserManager.getUpdates())
}
```

Another plugin will be implemented consuming information from the `security` topic with the goal to inject the information coming from the `VulnerabilityPlugin` into FASTEN Knowledge Base.

## Vulnerability Object Definition
In order to merge all the different sources together, a common difinition of vulnerability has been introduced. Here is a JSON representation of an example from the famous HearthBleed (CVE-2014-0160):

```json
{
"id": "CVE-2014-0160",
"description" : "The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.",
"severity": "HIGH",
"scoreCVSS2": 5.0,
"scoreCVSS3": 7.5,
"published_date": "07/04/2014",
"last_modified_date": "09/10/2019",
"vulnerable_purls": [
"pkg:generic/[email protected]",
"pkg:generic/[email protected]",
"pkg:generic/[email protected]",
"pkg:generic/[email protected]",
"pkg:generic/[email protected]",
"pkg:generic/[email protected]",
"pkg:generic/[email protected]"
],
"references": [
"http://advisories.mageia.org/MGASA-2014-0165.html",
"http://blog.fox-it.com/2014/04/08/openssl-heartbleed-bug-live-blog/",
"http://cogentdatahub.com/ReleaseNotes.html",
"...",
],
"patches": [
"http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=96db9023b881d7cd9f379b0c154650d6c108e9a3"
],
"exploits": [
"http://www.exploit-db.com/exploits/32764",
"http://www.exploit-db.com/exploits/32745",
"...",
],
"changed_files": [
{
"filename": "ssl/d1_both.c",
"date": "07/04/2014",
"line_numbers" : [
1459,
1489
]
},
{
"filename": "ssl/t1_lib.c",
"date": "07/04/2014",
"line_numbers" : [
2588
]
}
]
}
```
### Description of fields:
**id**: Identifies the vulnerability (e.g. `CVE-2014-0160`, `GHSA-3pc2-fm7p-q2vg`, `pyup.io-34978`)

**description**: Textual description of the vulnerability

**severity**: One of the following: `LOW, MEDIUM, MODERATE, HIGH, CRITICAL`

**scoreCVSS2**: Find more information [here](https://nvd.nist.gov/vuln-metrics/cvss/v2-calculator)

**scoreCVSS3**: Find more information [here](https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator)

**published_date**: Date when the vulnerability was published

**last_modified_date**: Date when the vulnerability has been last modified

**vulnerable_purls**: Package coordinates of vulnerable packages. Follows [purl-spec](https://github.com/package-url/purl-spec) guidelines

**references**: List of links to pages and documentation

**patches**: List of links to patches that **fixed** the vulnerability

**exploits**: List of links to exploits. Most of them from [exploit-db](https://www.exploit-db.com/)

**changed_files**: List of changed files gathered from the patches. It allows to find which callable was causing the problem.


## Sources of information

The `ParserManager` aggregates information from the following sources of information:

Source | License | Frequency of updates
------------ | ------------- | ---------------------
[NVD JSON Feed](https://nvd.nist.gov/vuln/data-feeds#JSON_FEED) | Public Domain | Every 2 hours
[GitHub Advisories](https://github.com/advisories) | Public Domain | Daily
[MSR 2019<sup>1</sup>](https://github.com/SAP/project-kb/tree/master/MSR2019) | Public Domain | n/a
[MSR 2020<sup>2</sup>](https://github.com/ZeoVan/MSR_20_Code_vulnerability_CSV_Dataset) | Public Domain | n/a
[Safety DB](https://github.com/pyupio/safety-db) (by pyup.io) | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | Monthly
[cvedb](https://github.com/fabric8-analytics/cvedb) (by [fabric8-analytics](https://github.com/fabric8-analytics)) | n/a | Daily
[victims-cve-db](https://github.com/victims/victims-cve-db) | [CC BY-SA 4.0](https://github.com/victims/victims-cve-db/blob/master/cc-by-sa-4.0.txt) | n/a
[Debian Security Tracker<sup>*</sup> ](https://salsa.debian.org/security-tracker-team/security-tracker/-/blob/master/data/CVE/list) | Public Domain | Daily
[RustSec<sup>*</sup> ](https://github.com/RustSec/advisory-db) | Public Domain | Daily

<sup>*</sup> _supported in future versions_

### References
<sup>1</sup> Ponta, S. E., Plate, H., Sabetta, A., Bezzi, M., &amp; Dangremont, C. (2019). A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). doi:10.1109/msr.2019.00064

<sup>2</sup> Jiahao Fan, Yi Li, Shaohua Wang and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In MSR ’20: The 17th International Conference on Mining Software Repositories,May 25–26, 2020, MSR, Seoul, South Korea. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3379597.3387501
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
131 changes: 131 additions & 0 deletions analyzer/vulnerability-plugin/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>analyzer</artifactId>
<groupId>eu.fasten</groupId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>vulnerability-plugin</artifactId>
<dependencies>
<dependency>
<groupId>eu.fasten</groupId>
<artifactId>server</artifactId>
<version>0.0.1-SNAPSHOT</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>eu.fasten</groupId>
<artifactId>core</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>info.picocli</groupId>
<artifactId>picocli</artifactId>
<version>4.0.4</version>
</dependency>
<!-- Parsing of Vulnerabilities -->
<dependency>
<groupId>org.owasp</groupId>
<artifactId>dependency-check-core</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.9.8</version>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>3.8</version>
</dependency>
<dependency>
<groupId>org.eclipse.jgit</groupId>
<artifactId>org.eclipse.jgit</artifactId>
<version>5.8.0.202006091008-r</version>
</dependency>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.26</version>
</dependency>
<!-- Testing -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.4.0</version>
<scope>test</scope>
</dependency>
<!-- Nitrite -->
<dependency>
<groupId>org.dizitart</groupId>
<artifactId>nitrite</artifactId>
<version>3.4.2</version>
</dependency>
<!-- MongoDB driver -->
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver</artifactId>
<version>3.4.3</version>
</dependency>
<!-- NB! Uncomment the following 3 dependencies if you want to run Main class-->
<!-- <dependency>-->
<!-- <groupId>org.slf4j</groupId>-->
<!-- <artifactId>slf4j-simple</artifactId>-->
<!-- <version>1.7.30</version>-->
<!-- </dependency>-->
<!-- <dependency>-->
<!-- <groupId>org.slf4j</groupId>-->
<!-- <artifactId>slf4j-api</artifactId>-->
<!-- <version>1.7.30</version>-->
<!-- </dependency>-->
<!-- <dependency>-->
<!-- <groupId>org.pf4j</groupId>-->
<!-- <artifactId>pf4j</artifactId>-->
<!-- <version>3.1.0</version>-->
<!-- </dependency>-->
<!-- JOOQ -->
<dependency>
<groupId>com.github.t9t.jooq</groupId>
<artifactId>jooq-postgresql-json</artifactId>
<version>1.0.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>1.72</version>
</dependency>
<!-- END HERE -->
</dependencies>
<!-- BUILD -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.2</version>
</plugin>
</plugins>
</build>
<!-- Reporting -->
<reporting>
<plugins>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.8.2</version>
<configuration>
<excludes>
<exclude>**/Main.*</exclude>
<exclude>**/MongoConnector.*</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
Loading