The Quest for an "Open Source Genome"
Whitepaper
A genome is the hereditary information encoded in an organism’s DNA. When patterns of hereditary information in a person’s genome are compared with the patterns of others, scientists can learn a great deal about that person’s ancestry. For example, a recent genetic study suggests that Genghis Khan’s direct patrilineal descendants constitute some 8 percent of men in a large swath of Asia—about 0.5 percent of the world’s total population.
Like a person, each piece of source code has the analog of DNA and a unique genome containing clues to its origin and history. To identify open source code elements and determine where they originated, we need the open source equivalent to the Human Genome Project, a 13-year research project that identified and documented all of the estimated 20,000- 25,000 human genes.
This paper describes how the “open source genome” concept can be useful in understanding the origin and history of your code. We explore the nature of open source, why and how software developers use it, how information from an open source genome can help identify open source in your code and establish its origins, and why that is important.