Skip to main content

Open Source Artifacts

Version Control Software VS. Artifact Repository

Version Control Software (VCS) is a repository that holds the source code of a project (eg: git), unlike Artifact Repository where the compiled code is hosted. Those 2 are separate systems, usually hosted and managed by different companies and have nothing in common except the mere resemblance of hosting "open source projects".

There is no guarantee that whatever is hosted on one service is related to the other, not even the name of the projects. An artifact located in the Artifact Repository can't be trusted by reading the source code hosted in the VCS as they aren't related in any way. In this article we exclude the VCS as it's independent of package managers. Moreover, in the artifact repository itself, source and binary artifacts can be uploaded separately and their contents can be unrelated to each other.

Manifests & Lockfiles

Manifest - A file that specifies the top level packages required for the project to work. The requirements are specified in a loose format stating the supported package versions rather than an exact version needed. Lockfile - A file that specifies the exact package versions for the top level packages and all of their dependencies, essentially "locking" them in place.

Dependency Types

Some package managers allow you to specify production and development dependencies in their main manifest file while others require you to manage it manually by renaming or separating the files.

Compilation (Source / Binary Artifacts)

Before uploading a project to an artifact repository, it needs to be compiled into a compatible format that the package manager understands. This format can be a source artifact or a binary artifact. A source artifact is one that contains the source code for the package. Depending on the language, after being downloaded it is compiled by the package manager before it can be executed. A binary artifact on the other hand comes pre-compiled by the publisher and is ready to use after it's downloaded.

Notes:

  • Depending on the language and the package manager, the artifacts can come with all of their dependencies embedded inside, or referenced to an external artifact repository package or local file.
  • The benefits of a binary package over a source package is a small gain in performance, as the package manager doesn't need to compile every package it downloads.
  • Although some package managers support publishing of source and binary artifacts altogether, there is no rule that states both of them need to be present.
Package ManagerManifestLockfileManages Dev/Prod DepsSource ArtifactBinary ArtifactEmbedded Deps
npmpackage.jsonpackage-lock.jsonVVXX
yarnpackage.jsonyarn.lockVVXX
piprequirements.txtXXVVX
nugetXpackages.config[1]XXVV (Binary)[2]
mavenXpom.xmlXVVV

  1. Optional, `packages.config` can be replaced by embedded `PackageReference`.
  2. Comes as an archive with `DLL` files where each file is placed under a folder structured as a dependency tree.

Identifying Composition

Source Artifact:

  • Analyzing the manifest file if it exists.
  • Analyzing imports within the code.

Binary Artifact:

  • Analyzing the structure of the archive.

Security Pitfalls

  • Source files can be replaced and renamed by the publisher.
  • External files can be added to the source files by the publisher.
  • Binary artifacts can be entirely replaced by the publisher as their contents can't be compared or analyzed.