It can often be quite interesting to study the source code metrics behind a software development project. Not a surprising interest: its accounting afterall. GnuCash has grown, over the years, from a small handy electronic checkbook to a rather large and multi-featured desktop app.
GnuCash currently consists of over a third of a million lines of code spread over more than a thousand files. It has been translated into twenty-three languages and credits over 139 authors and contributors.
If you've had trouble swimming through that mass of source code, think of it this way: printed out on paper, and bound into volumes, it would amount to several dozen copies of Tolstoy's "War and Peace", roughly a bookshelf-width's worth of source code.
Mind you, this is source code (and docs) crafted and debugged by actual humans, this is *not* autogenerated code. Tools (such as glade or swig) can generate gazillions of lines of code automatically; I'm not counting those. Every last line counted here was typed in, edited, indented, tweaked, multiple times, by human hands.
Given that we have about 400 outstanding bugs in bugzilla, that works out to about one bug per thousand lines of code, or one bug per 50 pages of printout. This bug count is actually not atypical for software projects; its near the norm.
The table below shows some historical lines-of-code and number-of-files metrics for the GnuCash development project. Note that not all of the code is counted: for instance, the Makefiles and configure.in and autogen.sh are not counted. Also, files that are automatically generated are not counted, nor are files that have been 'borrowed' from other projects. Also not counted are experimental files, miscellaneous perl scripts, various converters, addons and utilities. Finally, glade files are not counted, although large parts of the overall GUI are described in glade files.
Note also that KLOC's are not a good metric of programmer productivity, nor even that
wc is a good way of counting KLOC's. Much better measures are complexity metrics, which, for example, count the number and size of if-then-else blocks, or the number and size of all blocks, or the number of math operators per statement. Maybe someday we'll run one of those tools on this code. For now, this is what we have. On the other hand, we've attempted to count only those files that contain human-edited code, that is, files that are directly edited by humans. The point of doing this is to avoid artificially inflating the KLOC counts by counting automatically generated code (which is why the glade files are not counted: they are large and automatically generated).
Table 1. Historical Development Stats
|Version||engine||backend||register||ledger||motif||gnome||misc app||import-export||reports||scheme||business||test||docs||internal txt||Total||Languages||Author Credits|
|xacc-0.9 Sept 97||-||-||-||-||34 files (7.5+0.9)||-||-||-||-||-||-||-||5 files (0.4)||1 file (0.1)||40 files (8.8)||1||1|
|xacc-0.9w Dec 97||-||-||-||-||51 files (13.8+1.5)||-||-||-||-||-||-||-||9 files (0.8)||1 file (0.1)||61 files (16.2)||1||2|
|xacc-1.0.17 Feb 98||-||-||-||-||52 files (14.8+1.8)||-||-||-||-||-||-||-||12 files (1.4)||4 files (0.3)||68 files (18.3)||1||7|
|gnucash-1.1.15 Aug 98||24 files (6.2+1.5)||-||31 files (6.1+1.7)||5 files (1.4+0.4)||30 files (7.4+0.7)||17 files (3.4+0.5)||-||-||-||3 files (0.3)||-||-||16 files (1.9)||17 files (1.8)||159 files (34.7)||1 (0.17)||25|
|gnucash-1.2.2 Aug 99||41 files (10.2+3.6)||-||28 files (5.5+1.7)||14 files (2.4+0.6)||26 files (8.7+0.5)||-||-||-||-||14 files (1.4)||-||-||30 files (2.6)||15 files (1.8)||168 files (39.0)||3 (0.54)||41|
|gnucash-1.3.6 April 2000||41 files (12.9+4.0)||-||32 files (6.8+2.1)||19 files (4.0+0.8)||-||78 files (32.2+3.0)||-||-||-||74 files (4.0+0.7+12.3)||-||-||33 files (7.8)||25 files (4.5)||302 files (95.1)||5 (4.3)||61|
|gnucash-1.4.6 Sept 2000||43 files (13.0+3.6)||-||27 files (5.9+2.0)||24 files (5.4+1.8)||-||82 files (33.8+3.0)||-||-||-||68 files (4.0+0.7+15.5)||-||-||36 files (9.3)||36 files (4.8)||316 files (101.9)||7 (6.0)||82|
|gnucash-1.4.12 April 2001||43 files (13.1+3.6)||-||27 files (5.9+2.0)||24 files (5.4+1.8)||-||82 files (33.5+3.0)||-||-||-||73 files (4.0+0.7+17.7)||-||-||43 files (11.5)||39 files (6.0)||331 files (108.2)||12 (17.8)||97|
|gnucash-1.5.2 Sept 2000||46 files (14.9+3.7)||-||29 files (6.3+2.0)||25 files (5.7+1.8)||-||83 files (35.8+2.9)||-||-||-||73 files (4.6+0.8+16.8)||-||-||37 files (10.7)||48 files (8.2)||341 files (114.2)||8 (7.8)||89|
|gnucash-1.6.0 June 2001||139 files (42.8+8.3)||-||28 files (5.7+2.0)||23 files (10.1+1.5)||-||132 files (60.0+4.2)||-||-||-||102 files (6.2+0.8+27.3)||-||-||64 files (12.1)||69 files (12.9)||455 files (193.9)||11 (18.7)||123|
|gnucash-1.7.2 November 2002||104 files (28.7+7.8+3.3)||89 files (30.0+3.3)||34 files (5.2+2.0)
29 files (10.5+1.2)
|17 files (9.4+0.7)||-||143 files (56.0+4.7+0.9)||75 files (17.0+2.5+5.1)||78 files (11.1+1.5+7.3)||38 files (2.4+0.1+14.2)||17 files (3.4)||94 files (19.9+1.9+4.5)||72 files (7.9+0.1+0.7)||83 files (22.2)||62 files (11.6)||935 files (297.1)||21 (56.1)||130|
|gnucash-1.8.4 June 2003||100 files (29.7+8.3+3.4)||89 files (30.1+3.3)||35 files (5.3+2.0)
31 files (10.7+1.2)
|17 files (10.2+0.8)||-||151 files (58.7+5.1+1.1)||71 files (16.9+2.6+5.2)||86 files (13.3+1.8+7.5)||52 files (2.4+0.2+15.1)||17 files (4.1)||98 files (21.5+2.0+5.1)||76 files (8.6+0.2+0.7)||24 files (13.8)
199 files (80.3)
|69 files (14.3)||1115 files (385.5)||23 (62.4)||139|
Each cell displays the following:
number of *c and *.h and *.scm files (KLOCS in *.c + KLOCS in *.h + KLOCS in *.scm). If there are no *.scm files in the directory, then only (KLOCS in *.c + KLOCS in *.h) are displayed. If there is only one number in the parenthesis, it is the approriate KLOC count for that statistic.
where KLOC == kilo-lines-of-code, as reported by
wc. As noted above, wc is not a terribly good code metric, but its what we have handy.
- Contents of the src/engine and the include directories. The engine was split out from the motif code in version 1.1. The data storage backend (file-io, sql) was split out in the course of version 1.5
- Contents of the src/backend directory (version 1.7 and later) or of src/engine/file, src/engine/sql (version 1.6 and earlier)
- Contents of the src/register directory (version 1.6 and earlier) or src/register/register-core (version 1.7 and later). The register was split out as a separate component from the motif code in version 1.1. As can be seen from the stats, the register code has been fairly stable. At version 1.7 and later, this cell shows a second count: the number of lines of code in src/register/register-gnome (previously counted as part of gnome)
- *.c, *.h files in the src directory only (version 1.6 and earlier) or src/register/ledger-core (version 1.7 and later)
- Contents of the src/motif directory (version 1.2 and earlier). The motif version of the code was discontinued after version 1.2, after most of the non-gui code was moved to either the engine, the register or the ledger.
- Contents of src/gnome plus src/register/gnome (version 1.6 and earlier). For version 1.7 and later, this consists of src/gnome, src/gnome-search and src/gnome-util
- misc app
- Contents of miscellanous application-related directories (version 1.7 and later): src/app-file, src/app-utils, src/calculation, src/core-utils, src/gnc-module, src/network-utils, src/tax/us
- Code to import and export various file formats: contents of the src/import-export directory.
- Code to generate reports and graphs: contents of the src/reports directory.
- scheme and guile code in directories src/scm plus src/guile (version 1.6 and earlier). In version 1.7 and later, much of this code went into reports, import/export, and into indiovidual modules; thus only miscellaneous code remains.
- Code to add small-business features: contents of the src/business directory.
- Code to peform automated regression tests: contents of the src/*/test directories.
- English-language-only user documentation, including on-line help and manual (html, sgml or xml). For version 1.8.4 and later, the number below the bar counts the translated, non-english docs (currently de, es, fr, pt_PT). Both of these numbers are somewhat hard to count, because of fairly large format churns, and multiple competing versions.
- internal txt
- The number of design documents and README files aimed at developers. This includes *.txt files, *.texinfo files and README.* files in all subdirectories. For version 1.7 and later, only those in the src subdirectory are counted (leaving out some half-dozen scattered elsewhere)
- The number of languages that the application messages have been translated to (the number of po/*.po files). In parenthesis, the number of messages in the message files (grep msgstr po/*.po |wc), in thousands.
- Author Credits
- The number of people credited in the AUTHORS file (version 1.6 and later) or the README file (earlier versions). These include lead developers, patch submitters and national-language translators. This includes additional credits listed in the gnucash-docs/AUTHORS file that are not listed in the main gnucash/AUTHORS file.