Open Source Artifacts
Version Control Software VS. Artifact Repository
Version Control Software (VCS)
is a repository that holds the source code
of a project (eg: git
), unlike
Artifact Repository
where the compiled code
is hosted. Those 2 are separate systems, usually hosted and managed by
different companies and have nothing in common except the mere resemblance of hosting "open source projects".
There is no guarantee that whatever is hosted on one service is related to the other, not even the name of the projects.
An artifact located in the Artifact Repository
can't be trusted by reading the source code hosted in the
VCS
as they aren't related in any way. In this article we exclude the VCS
as it's independent of package managers.
Moreover, in the artifact repository itself, source and binary artifacts can be uploaded separately and their contents
can be unrelated to each other.
Manifests & Lockfiles
Manifest
- A file that specifies the top level packages required for the project to work. The requirements are
specified in a loose format stating the supported package versions rather than an exact version needed.
Lockfile
- A file that specifies the exact package versions for the top level packages and all of their dependencies,
essentially "locking" them in place.
Dependency Types
Some package managers allow you to specify production
and development
dependencies in their main manifest file
while others require you to manage it manually by renaming or separating the files.
Compilation (Source / Binary Artifacts)
Before uploading a project to an artifact repository, it needs to be compiled into a compatible format that the package
manager understands. This format can be a source
artifact or a binary
artifact.
A source artifact is one that contains the source code for the package. Depending on the language, after being
downloaded it is compiled by the package manager before it can be executed.
A binary artifact on the other hand comes pre-compiled by the publisher and is ready to use after it's downloaded.
Notes:
- Depending on the language and the package manager, the artifacts can come with all of their dependencies embedded inside, or referenced to an external artifact repository package or local file.
- The benefits of a binary package over a source package is a small gain in performance, as the package manager doesn't need to compile every package it downloads.
- Although some package managers support publishing of source and binary artifacts altogether, there is no rule that states both of them need to be present.
Package Manager | Manifest | Lockfile | Manages Dev/Prod Deps | Source Artifact | Binary Artifact | Embedded Deps |
---|---|---|---|---|---|---|
npm | package.json | package-lock.json | V | V | X | X |
yarn | package.json | yarn.lock | V | V | X | X |
pip | requirements.txt | X | X | V | V | X |
nuget | X | packages.config[1] | X | X | V | V (Binary)[2] |
maven | X | pom.xml | X | V | V | V |
Identifying Composition
Source Artifact:
- Analyzing the manifest file if it exists.
- Analyzing imports within the code.
Binary Artifact:
- Analyzing the structure of the archive.
Security Pitfalls
- Source files can be replaced and renamed by the publisher.
- External files can be added to the source files by the publisher.
- Binary artifacts can be entirely replaced by the publisher as their contents can't be compared or analyzed.