FOSSology is an open source license compliance software system and toolkit. As a toolkit you can run license, copyright and export control scans from the command line. As a system, a database and web ui are provided to give you a compliance workflow. In one click you can generate an SPDX file, or a ReadMe with the copyrights notices from your software. FOSSology deduplication means that you can scan an entire distro, submit a new version, and only the changed files will get rescanned. This is a big time saver for large projects.
FOSSology is a framework, toolbox and Web server application for examining software packages in a multi-user environment. A user can upload individual files or entire software packages. Fossology will unpack this upload if necessary and run a chosen set of agents on every file of the upload. An agent can implement any analysis operation on a text file. The FOSSology package as of now focuses on license relevant data. However, it could be extended with analyses for different purposes (e.g. static code analysis).
Regular Expression Scanning for Licenses with Nomos
Nomos is one of FOSSology’s license scanners. Nomos does license identification using short phrases (regular expressions) and heuristics, e.g. a phrase must be found in (or out of) proximity to another phrase or phrases. This helps to eliminate false positives.
Nomos uses stages of license recognition: First it uses keywords to identify license relevant statements. Then, it uses a hierarchical structure of regular expressions in order to identify particular licenses.
If the recognition is not complete, Nomos will either return ‘UnclassifedLicense’ or a category of licenses, such as ‘BSD’ or ‘GPL’ – not enough to identify unambiguously a license, but as much as possible to support the user to determine the license situation. What happened in this case is that in the hierarchy of matching phrases, the found text “cannot do deep enough” to determine a particular license. Note that ‘BSD’ or ‘GPL’ require to determine the exact license, such as ‘BSD 3 Clause’ or ‘GPL 2.0’.
But the fact, that Nomos identifies a “style” type of license if it has similarities with a known license type enables Nomos to recognize also new or unknown licenses.
Text-Similarity Matching with Monk
Monk is another one of FOSSology’s license scanners. Monk performs text based searches and thus requires good license texts/patterns to search for. It uses the Jaccard index as a text similarity metric added with a weighting for ranking different matches by their size. Ranking different matches by their size is relevant if license texts are very similar and result in different Jaccard text similarity numbers (e.g. different versions of the BSD). In this case not only the best match but also the largest match metric is relevant.
At upload Monk uses the license texts stored in the Fossology server. Using the monk agent, the user can also define own text phrases to identify a given license.
NOTE: Monk will tell the user a score of the existing license, however, it will not be able to recognize an unknown (new) license. That shows the sense to have two license scanners, Nomos and Monk.