The title of this post may be a bit misleading. This article is not about how people find open source software (OSS) to use, it’s about how people go about finding OSS used in the applications they develop.
You are probably asking, “Why is that important?”
Today, many organizations have realized that not only do they use a lot of OSS in their development, but they have been using it for years. As they begin to build the list of packages and licenses in use, they soon find out that it’s not simply a matter of asking their developers for a list.
Developers don’t always remember everything they downloaded and used in their development. Some examples of “forgettable” software include: code snippets from books, blogs, tip sites, and small utilities (usually no more than a few lines of code each). And not all developers that downloaded code and placed it into a production source code repository are still with your organization.
This all makes for a complex issue, especially when compounded with requests from customers or partners for information on your use of OSS and your proof of OSS license compliance.
So, how do you find OSS in your code? Here are some useful tips you can employ to find most of the third-party code, OSS and other, used in your product.
Ask your developers
I already gave examples of why asking your developers is not a good way to get a comprehensive list of OSS, but I didn’t mean to insinuate it is a bad way either. Asking your developers to give you a list of the packages they know about is a great place to start and can be used to cross-check with the other techniques listed below.
If your developers aren’t sure how to get this information they can begin by looking at readme documents and websites where they originally obtained the OSS they use. These are good sources for package names, urls, and copyright and license information.
Get an OSS code scanner
This is probably the best way to do a comprehensive sweep of your code to find OSS, down to the snippet level, that wasn’t originally developed by your team. If you are resource-constrained and don’t have time to run a scan and do the analysis yourself, consider contracting with a firm who will scan your code for you. This is a great option because these firms have people who review scans daily and have developed excellent skills for finding and reporting OSS.
Get a license and copyright scanner
There are different types of scanners. A number of companies, including OpenLogic, have built a good business with code scanners, but other organizations have created excellent products designed to find license text, author, and copyright information. These products may do some code matching, but it is not their main strength. This is important because even with the best OSS scanner there is no guarantee they will match with every known OSS ever written. Using a secondary tool is another great way to crosscheck your results. One example of a license scanner is Fossology (which, by the way, is OSS).
Put it all together
Once you have a good list from your developers, a list from code matches, and a good list of other licenses and copyrights found from the license scanners, you will need to mash up the results. Your goal is to identify, as much as possible, the origin of the code you found and – more importantly – its license.
During your analysis of the results you will begin to gain some insight into the twisted world of OSS. Google, Wikipedia, Ohloh, Sourceforge.net, et al. will become your new best friends. You will find developers who say they use a BSD license, but put a copy of the MIT license on their website because they probably thought the MIT license was actually the BSD license. You will see code that has been used (and released under various licenses) in several different OSS packages. So which license do you choose?
Regardless of your reason for going through this exercise, whether it’s a customer asking for an OSS bill of materials, your requirement to comply with in internal policy, or a need to better understand and track the third-party components in your product, you will discover that it will help you develop a better understanding of your code and will provide a great return on investment.