Tom Goddard
written Dec 19, 2001,
revised Feb 25, 2004
This guide is for people who want to compile Sparky or develop new features in C++.
Sparky is written in C++ and Python and uses the Tcl/Tk windowing library. I compile binary distributions for Microsoft Windows, Linux, Mac OS X, SGI (IRIX), Alpha (Tru64), and Sun (Solaris).
The basic compilation procedure is:
% cd [sparky-source-directory] % bin/make-sparky ... compilation output % bin/make-sparky install ... installation output
On Microsoft Windows the make-sparky-win32 script is used instead of make-sparky.
The make-sparky script sets paths to Python, Tcl/Tk and Sparky source, and variables specifying the compiler to use and compilation options and then invokes make on the top level Makefile. There is a similiar script make-sparky-debug for making versions with debugging information. And there is a separate script make-sparky-win32 for the compilation on Microsoft Windows.
To make the above compilation procedure work on a new machine you need to edit the make-sparky script. That csh script has cases for each machine it is used on:
# ----------------------------------------------------------------------------- # Linux, Redhat 8.0 using gcc 3.2 # if ($HOST == feyerabend.cgl.ucsf.edu) then set SPARKY = /usr/local/src/sparky set SPARKY_OBJ = $SPARKY/obj set SPARKY_INSTALL = /usr/local/sparky set PYTHON_PREFIX = /usr/local/python-2.3.3 set TK_PREFIX = /usr/local/tcltk-8.4.5 set STANDARDS = "-D_POSIX_SOURCE -D_XOPEN_SOURCE -D_XOPEN_SOURCE_EXTENDED -D_FILE_OFFSET_BITS=64" set CXXFLAGS = "-Wall -O -ansi -pedantic -Wno-long-long $STANDARDS" set LDFLAGS = -L/usr/X11R6/lib set EXTRALIBS = -ldl set CXX_RULE_PREFIX =
The machine listed in the script are ones I use to build Sparky distributions. Each one has a comment listing the operating system and sometimes the compiler being used. Start by copying the closest match for your machine. Then you need to set the PYTHON_PREFIX and TK_PREFIX variables where Python and Tcl/Tk are installed. The PYTHON_PREFIX path should be to a directory just containing Python (in bin, lib, include subdirectories) since the whole directory is copied when installing Sparky. Set the SPARKY variable to the Sparky source code directory, the one containing c++, python, lib, manual, ... subdirectories.
Compilation on Windows is done using make and the GNU gcc compiler with the script make-sparky-win32. The compiler and other needed tools are provided in the Minimalist GNU for Windows (MinGW) package and Cygnus GNU Windows package (Cygwin). The build and install procedure is just like on Unix. More details are described at the top of the make-sparky-win32 script.
sparky-no-python | Executable for running without Python |
spy.so | Sparky shared library loadable by Python |
_tkinter.so | Python interface to Tkinter |
ucsfdata | Program to display and change UCSF format NMR spectrum files |
pipe2ucsf | Convert a NMRPipe processed spectrum to UCSF format |
vnmr2ucsf | Convert a Varian processed spectrum to UCSF format |
bruk2ucsf | Convert a Bruker processed spectrum to UCSF format |
peaks2ucsf | Make a UCSF format spectrum containing specified Gaussian peaks |
matrix2ucsf | Convert a matrix of floats to a UCSF format spectrum |
The compilation builds an executable called sparky-no-python which is run when Python is not available. The Python extensions to Sparky represent about half of the functionality of the program and cannot be used by that version. Most new Sparky features are written in Python. For use with Python, a shared library called spy.so is made. The Sparky compilation also builds _tkinter.so, the Python interface to the Tk library. This is a module distributed with Python but I distribute my own version because Sparky requires a non-standard version of _tkinter.so without thread support.
Tcl/Tk and Python are free.
Tcl/Tk | www.scriptics.com |
Python | www.python.org |
Sparky gets installed in one directory, usually
/Applications/Sparky.app
on Mac OS X,
/usr/local/sparky
on Linux, or other Unix systems, and
c:\Program Files\sparky
for Windows.
The following files are installed in the installation directory, or on the Mac in Sparky.app/Content/Resources.
README - reminder of where web site and manual are LICENSE - contains copyright, license and disclaimer manual index.html - start of HTML manual intro.html - introduction to basic commands changelog.html - list of changes version by version *.html - other sections of manual manual.html - manual in a single file manual-postscript - Postscript version of manual example - sample data bin sparky - a script that starts Sparky sparky-no-python - the executable for running without Python pipe2ucsf - conversion from NMRPipe format to Sparky format ucsfdata - prints UCSF NMR data header, extracts data matrix vnmr2ucsf - conversion from Varian VNMR format to Sparky format bruk2ucsf - conversion from Bruker format to Sparky format peaks2ucsf - creates simulated specta from list of Gaussians matrix2ucsf - make a UCSF format spectrum from a data array lib Sparky - the Tk resource file, fonts sizes, ... print-prolog.ps - Postscript used for printing spectrum windows libtcl8.3.so - Tcl shared library libtk8.3.so - Tk shareed library tcl8.3 - Tcl library scripts tk8.3 - Tk library scripts python README - description of Python interface to Sparky sparky/*.py - code for Sparky extensions sparky/spy.so - the C++ Sparky Python module lib-tk - Tkinter, the Python interface to Tk python2.3 - Python 2.3 distribution
The Sparky installation includes a copy of the Tcl/Tk libraries so they do not need to be installed separately on the machine where Sparky will be used. It also includes the needed version of Python.
The Python interface to Tk, called Tkinter, is a standard component of Python. Sparky compiles its own copy of the Python code for the _tkinter library with multi-threading code turned off. This is necessary because Sparky makes direct calls to the C Tk routines and that is not compatible with the thread locking done by _tkinter. The Sparky compiled Tkinter module overrides any existing one in the Python distribution.
The user starts Sparky with the command sparky on Unix or sparky.bat on Windows which is a script that checks whether Python is available. If it is not available a standalone program called sparky-no-python is started. If it is available, the script starts the Python interpretter and imports the Sparky shared library. You can tell if the Sparky is using Python by checking if there are any entries under the Extensions menu.
There are some other useful make targets.
% make-sparky clean % make-sparky Makefile.dep % make-sparky TAGS % make-sparky spy.so % make-sparky pipe2ucsf
These targets are passed to the c++/Makefile. Make clean removes all files created by the compilation. This is useful when you have compiled a debugging version and now want to compile an optimized version. Without the make clean the compilation will not rebuild the object files because they appear to be up to date. It is also useful when you want everything recompiled so you can look at compiler warning messages.
The Makefile.dep target rebuilds c++/Makefile.dep which is included by the c++/Makefile. It contains lines showing what header files each object file depends on. It is created using the GNU compiler -MM option which scans the source code for #include "somefile.h" lines. Other compilers may require -M instead of -MM. You need to rebuild Makefile.dep if you add #include lines to the code. Otherwise, a make may not recompile all the necessary files. Debugging where not all compiled code is up to date can be a very hard problem to detect.
The TAGS target rebuilds the c++/TAGS file using etags. This is a list of all the functions appearing in the source code. It is used by the emacs editor. With the cursor over a function name in the source code (usually a function call) the "meta-." emacs command takes you to the file and line where that function is defined. This is handy for navigating the more than 100 source code files.
The spy.so, pipe2ucsf, sparky-no-python, ..., targets allow you to recompile just one of the binaries. With no arguments make-sparky recompiles all the binaries. If you are only working on one of the binaries it saves time to just rebuild that one.
There is sample data in the example subdirectory. I usually open this project and change a peak assignment as a basic test of the code. No test suite is available. I use an extensive set of data at UC San Francisco for manual testing but that data is not distributed.
I use the gdb or dbx debugger to investigate program crashes. I prefer to use the gdb debugger if available because it does a better job on C++ code than dbx.
When Sparky is run with Python the Sparky spy.so shared library is opened by Python with the dlopen() system call. To use the debugger you start debugging the Python interpreter. Symbols from the Sparky code will not become available until the Python reads the shared library. It is necessary to set the PYTHONPATH environment variable. Also, if the Tcl/Tk version needed by Sparky is not installed in a standard place on your system you will need to set the LD_LIBRARY_PATH and TCL_LIBRARY and TK_LIBRARY environment variables to point to Sparky's copies. Setting these environment variables is just what the Sparky start-up script does.
Here's an example of debugging with dbx where I caused a core dump with the Sparky accelerator !d.
% setenv PYTHONPATH /usr/local/sparky/python:/usr/local/sparky/python/lib-tk % setenv LD_LIBRARY_PATH /usr/local/sparky/lib % setenv TCL_LIBRARY /usr/local/sparky/lib/tcl8.3 % setenv TK_LIBRARY /usr/local/sparky/lib/tk8.3 % dbx /usr/local/bin/python2.1 dbx version 5.1 (dbx) run -c "import sparky; sparky.start_session()" thread 0x3 signal IOT/Abort trap at >*[__nxm_thread_kill, 0x3ff8057d3e8] ret zero, (ra), 1 (dbx) where > 0 __nxm_thread_kill(0x6, 0x0, 0x3ff805762b8, 0x3ffc0184000, 0x3ffc0184000) [0x3ff8057d3e8] 1 pthread_kill(0x1402a4100, 0x0, 0x0, 0x0, 0x1) [0x3ff805762d0] ... 9 invoke__11AcceleratorXv() [0x3ffbfef2d50] 10 parse_key__18Command_DispatcherXc() [0x3ffbfef6fd0] 11 parse_key_event__18Command_DispatcherXPv() [0x3ffbfef6db0] 12 parse_key_cb__11main_dialogXPvPvPv() [0x3ffbff7a5b0] 13 event_cb__14Event_CallbackXPvP7_XEvent() [0x3ffbffcd7c4] 14 Tk_HandleEvent(0x1, 0x14032f8d8, 0x140172610, 0x1402fb600, 0x1402fd3b0) [0x30000032308] ... 27 Py_Main() [0x120011e94] 28 main(0x0, 0x3ffc0080050, 0x0, 0x0, 0x12001166c) [0x1200116e0] (dbx)
Here's an example using gdb for debugging. I typed control-c to gdb after Sparky had started and set a break point in the Spectrum constructor and then continued running Sparky.
% gdb python2.1 (gdb) set environment PYTHONPATH /usr/local/sparky/python:/usr/local/sparky/python/lib-tk (gdb) run -c "import sparky; sparky.start_session()" Program received signal SIGINT, Interrupt. (gdb) share Reading symbols from /usr/local/sparky/python/sparky/spy.so...done. (gdb) b Spectrum::Spectrum Breakpoint 1 at 0x3ffbfd5a1d0: file c++/spectrum.cc, line 74. (gdb) continue
Since you are debugging Python, to run Sparky you pass some command-line arguments to tell Python to start Sparky. These are just the command-line arguments that the Sparky start-up script uses.
(dbx) run -c "import sparky; sparky.start_session()"
If a signal is caught, for example, I type control-c, gdb will not recognize any of the Sparky symbol names because it did not see that the shared library loaded by dlopen(). To get gdb to read the symbols use the command "share". Then you can set break points or print variables using their names.
Sometimes you need to set a break point before you start Sparky to stop and inspect variables and trace execution before a crash happens. The trouble is you cannot specify where to break until the spy.so library is opened and the address of the shared library in memory is established. If the problem you are tracking requires opening spectra or typing commands you can start the program, ctrl-c, use the share command and set a break point as in the above example. If you you need to set a break point before Sparky enters its event loop you can debug sparky-no-python instead of the Python version if possible. That avoids the shared library difficulties. Or you can start Python with no arguments for starting Sparky, then type "import sparky" to Python. Now the Sparky shared library will be loaded and you can ctrl-c and set break points then continue and start Sparky with the Python command "sparky.start_session()".
Example debuging a core file with dbx.
% dbx /usr/local/bin/python2.1 corefile /usr/tmp/sparky-core Core from signal SIGABRT: Abort (see abort(3c)) (dbx) where ... 5 ::builtin_menu_accelerator(Session&,const Stringy&,void*)(s = 0x103d0458, accel = 0x103d05c4,= (nil)) ["c++/command.cc":313, 0x5fec0564] 6 Accelerator::invoke(void)(this = 0x103d05c0) ["c++/command.cc":163, 0x5fec691c] 7 Command_Dispatcher::parse_key(char)(this = 0x10226438, key = 'd') ["c++/command.cc":618, 0x5fec5c88] ... (dbx)
It is useful to move up and down the function call sequence and look at variable values with dbx or gdb commands up, down, and print.
Directories and files in the Sparky source code distribution:
c++ | C++ source code for Sparky | |
python | Python source code for Sparky extensions | |
lib | Tk resource file and Postscript print header | |
bin | Sparky start-up scripts, build scripts, distribution scripts | |
manual | HTML source for Sparky user manual | |
example | Example data files distributed with Sparky | |
mac | Extra files to build Mac application | |
ideas | Ideas for improvments or new features |
Currently all Sparky code is in C++ (in directory c++) and Python (in directory python). New features added after the introduction of the Python interface in 1997 have been written primarily in Python. The extensions section of the manual documents these. I package Tcl/Tk libraries with Sparky since they are necessary to run the program and may not be on some systems. I also include the Python Tkinter module with Sparky since it is not a standard part of Python. The Tkinter module has a shared library I build from source code c++/_tkinter.c and makes use of Python code python/lib-tk. These source files were developed by others. I take them from the Python source code distribution. Only the _tkinter.c file has been modified to compile without thread support.
The C++ code defines 190 classes (Jan 2001), 81 in header files and 109 internally in the *.cc files. A list of the classes can be obtained with:
% cd sparky/c++ % grep "^class " *.h *.cc | grep -v \;\$
which finds all lines beginning with the word class and not ending with a semicolon.
A good starting point is to look at the Session class which contains all objects associated with a Sparky session. It is defined in c++/session.h. Some other important objects are:
Class | File | Description |
---|---|---|
NMR_Data | nmrdata.h | Object representing a spectrum data matrix |
Spectrum | spectrum.h | Manages spectrum data and a set of marked peaks with assignments |
View | uiview.h | Window for displaying contoured spectra and assignments |
Project | project.h | Manages collection of spectra, views and inter-spectrum operations |
Peak | peak.h | Marker for a peak with associated assignment, volume, and linewidth |
WinSys | winsystem.h | Interface that wraps all calls to Tk user interface functions |
Notifier | notifier.h | For objects to send and request notification when other objects change |
Command_Dispatcher | command.h | Processes two letter accelerators for commands |
pick_dialog | uipick.cc | Dialog for setting peak picking parameters |
Python interface | python.cc | Implements Python language interface to C++ shared library |
There are about 80 C++ header files (.h suffix) and 120 source code (.cc suffix) so the above table is only a sample. The main methods I use to get around in the code are: 1) I remember where things are; 2) I use emacs and TAGS to jump from a function invocation to the file and line where the function is defined; 3) I use a script bin/lookfor that uses grep to search all code files for a specified string.
% cd c++ % lookfor marked_volume dataregion.cc:int Marked_Region::marked_volume() const linefit.cc: int n = mDataVolume.marked_volume(); dataregion.h: int marked_volume() const; %
I also use a script bin/replace which replaces a string with another string in all Sparky source files, checking them out under RCS if necessary. This is very useful for changing a name that occurs in 20 different files but it is also very dangerous. I always use the lookfor command to see what will be replaced. Exercise extreme caution since this can damage alot of files fast.
% replace marked_volume covered_volume dataregion.cc Checked out dataregion.cc linefit.cc Checked out linefit.cc dataregion.h Checked out dataregion.h %
The Python language interface to the Sparky C++ share library is described in python/README. Many of the C++ data objects have mirror Python objects. The implementation of this interface is in c++/python.cc. Some other Python code of interest:
File | Description |
---|---|
README | Describes interface to Sparky data and C++ features |
__init__.py | Initialization code for the Sparky package. Defines start_session(). |
sparky_site.py | Site initialization file. Adds standard extension menu entries. |
pythonshell.py | Implements the Python shell window (accel py) in Sparky |
pyutil.py | Generic Python utility routines |
tkutil.py | Useful combinations of Tk widgets |
sputil.py | Sparky specific functions used in many extensions |
pdb.py | Code for reading atom coordinates for a PDB model |
readpeaks.py | Read a text peak list and create peak markers on a spectrum |
strips.py | Strip plot display |
The above files are a few of the approximately 60 Python code (.py suffix) files. The manual describes all of the distributed Python extensions and the files that implement them.
All of the C++ and Python source code files are under RCS (Revision Control System). This keeps track of old versions of the source code. It is useful for backtracking when you mess up the code, and for examining when a bug crept into the program. See the unix man pages rcs(1), ci(1), co(1). The ci and co commands let you check in and out files. I use the emacs editor vc mode instead because it makes checking out files a few key strokes and I can easily see everything that is checked out and check it all back in in one operation. I do not check files in and out for every different change I make to the code. Instead I keep checking things out as needed and then check everything back in before I release a new version, all having the comment "version 3.xx". I increment the xx part of the 3.xx version number on each distribution to the James' group. I have done 80 releases so far spaced by weeks or months.
There are two additional source files that are needed for running Sparky. The Tk resource file lib/Sparky define font sizes and other widget information like sizes of entry fields, labels for buttons, manual URLs for help buttons, menu entry text, .... The other file lib/print-prolog.ps is a postscript header used when you print a contour plot.
Below are a few problems with the code. I don't consider any of them to be of major importance. I think the code is in a good enough state that the development of new features or improvement of old features is more worthwhile than fixing the program's internal problems.
Both Don Kneller, the creater of Sparky, and I have used a mix of naming conventions for objects, classes, data members, and functions. I tend to use underscores and upper case for class names while Don didn't use underscores. Also Don and I use different indenting and placement of curly brackets. None of this code style heterogeneity has caused me any problem. In fact, it is useful to be able to see who last worked on a section of code by the naming and formatting style.
Since Sparky was originally written in C, pointers were used extensively. As it migrated to C++ references were also used extensively. Some conventions about when to use pointer function arguments versus reference arguments might be useful. For example, if an argument can be NULL make it a pointer, otherwise it is a reference. Also pointer versus reference function return values could follow a similar convention. No such convention is used in the code.
The Sparky C++ code has been under development since about 1994. Available compilers were not able to handle templates easily in the early days. Conforming implementations of the STL (standard template library) are not even available now (2000) on all platforms. Neither templates nor the container classes of the STL have been used. This means I have implemented my own basic classes like lists, tables, and a string object.
The Sparky manual is in HTML in directory manual the first page being index.html which is symbolic link to overview.html. The files that make the manual overview.html, intro.html, views.html, ..., are listed in the manual directory Makefile as the value of the MANUAL_SECTIONS variable.
The manual is broken into separate files to keep them small. This makes viewing them on the Sparky web site work well with limitted download speeds. The manual also comes as part of every Sparky distribution. With the manual on the local machine it is possible to quickly load the whole manual as a single document. This is advantagous if you want to search the whole manual for some word. I wrote a Python script htmlcat located in bin to create a single file version of the manual called manual.html by concatenating the small source html files and making necessary changes to the reference and anchor names. This single file version of the manual also aids in making a Postscript version for printing.
It is useful to be able to print the Sparky manual. This can be done by printing the single file version using the print command in a web browser. Unfortunately it is difficult to find desired sections of the manual because the links become useless. This can be remedied by adding page numbers and page references for all links in printed output. This was a capability in the old NCSA Mosaic browser. But it is not available in current Netscape or Internet Explorer browsers (circa 1999) so I have written a Python script html2ps to add the page numbers and references. Running html2ps is a painful process requiring you to print the manual to a file with the Netscape browser by hand a few times when prompted. This is needed to determine where the Postscript page breaks occur. The page numbered Postscript version of the manual is included in all Sparky distributions. But because of the hand printing steps it is not done by the manual Makefile install target. It must be done separately when an updated manual is to be distributed. The images places in the manual are in the manual/images directory. Not all files in this directory necessarily are used in the manual. The ones that are in the manual are listed in the Makefile so the install target can copy the needed images. The Postscript version of the manual is about 20 Mbytes (in 1999) because the images take alot of space. Compressed versions called manual-postscript.gz (for Unix) and manual-postscript.zip (for Windows) are placed in the distributions.
Sparky was created in 1989 by Don Kneller who was a grad student working with professor Tack Kuntz at UCSF. It was written in C with a bit of Fortran. In the last couple years Don started using C++. Another grad student Mark Day wrote a C library defining the UCSF NMR spectrum file format and a processing program called Striker. Striker did Fourier transforms of time domain NMR data, windowing, zero filling, phasing, and other manipulations to produce a frequency domain spectrum that was then analyzed using Sparky. Both programs had extensive graphical user interfaces. Their strong points were ease of viewing and interacting with spectra data. Both were based on NeWS an X server extension introduced by Sun Microsytems that allowed drawing on the screen using Postscript. Sun dropped support for NeWS around 1994 or 95. It was foreseen that in a few years no X server would support NeWS and Striker and Sparky would be dead unless their graphical user interfaces were rewritten. Around 1995, Don Kneller and Mark Day left.
Tom James' NMR group were the principle users of Striker and Sparky from the start. The Computer Graphics Lab headed by Tom Ferrin saw an opportunity here and picked up Sparky. They had developped Midas, a molecular display program in previous years under Bob Langridge. They saw opportunities to combine NMR structure determination and molecular visualization programs. They hired me, Tom Goddard, to port Sparky from NeWS to a living Unix window interface Motif in 1996. I ported to Motif. CGL received 5 years of funding from an NIH National Center for Research Resources grant which included money for Sparky development. We wanted to make Sparky and our next generation molecular visualization program Chimera extendable by users. Python was chosen as an extension language. So I added a Python interface to the Sparky to access spectrum and peak data and control Sparky graphical capabilities. The graphical user interface library of choice with Python is called Tkinter and is based on Tcl/Tk. So I ported Sparky from Motif to Tk. We decided not to develop Striker and after 1999 all machines with NeWS capable of running Striker disappeared.
Operating system code for Windows is in c++/system-win32.cc. Graphics system code for Windows is in c++/winsystem-win32.cc. The corresponding files with "win32" replaced by "unix" implement the Unix specific code.
I originally tried to make the Sparky port to Windows use Cygwin and their Unix compatibility library. I couldn't make a DLL that could be imported by Python and work correctly. The dll would crash on calls to malloc and fprintf. This appeared to be an incompatibility between the Cygnus libc and the MSVC runtime library (msvcrt.dll) which was used by the precompiled Python 1.5.1 binary.
To produce a working Sparky dll I switched to a port of gcc that uses the Windows standard C runtime library (crtdll.dll). This is the Minimalist GNU for Windows project, MinGW. Giving up the cygwin posix library I wrote some windows specific code contained in system-win32.cc and winsystem-win32.cc to replace the Unix system and X windows calls Sparky uses.
The Sparky Tcl/Tk code ported with no problems. But I used X windows xlib calls to support my spectrum contour windows and edge panels (resonances, slices, scales, ...) Some of the Xlib calls were provided by Tk on Windows. Some I replaced with native win32 calls. I wasn't able to easily rotate text 90 degrees as is used in resonance panels. So I just stack the letters on top of one another.
I do code development with emacs. Here are the most useful features.
There are about 200 C++ source and header files in Sparky. Jumping around between them is greatly facilitated by the above commands.