Niels Ott

Computational Linguist

The USES Issue

Tuesday, December 2. 2008 • Category: Automatic MindComments (13)Trackbacks (0)

-->

Intro

It is hard to term the phenomenon without offending someone. Good names would be Scienceware, or Guruware, or even better Scientistware. They are all taken by companies or other institutions that presumably all do a way too good job to provide a name for a negative aspect. So let me call it USES for Unsustainable Software Emerging from Science. This blog post shall shed some light onto the issues of USES and onto possible reasons.

What USES is All About

As a computational linguist, I am working with specialized software each and every day. May it be part-of-speech taggers, tools to explore corpora or treebanks, or simply software development tools such as compilers, or even integrated development environments. But there is one type of software standing out: Software that emerged from Science (USES). Common features of this type of program include:

  • Usually developed for a very special and highly sophisticated purpose that is only understood within the field.
  • Developed by a single person who is a major expert within this field.
  • The major expert developing the software often is not a major expert in neither software design nor software architecture.
  • Examples of file formats are rare and there is little to no documentation about the software.
  • The software reacts unexpectedly on certain types of input, e.g. it ignores syntax mistakes in grammar files and then malfunctions without telling users why.
  • The software is often not completely finished and and includes some missing bridges at the end of some roads without any warning signs.
  • The only person who knows how to work with it is the major expert in the field who is not the major expert in writing usable software.

Since I am not intending to offend anybody, let me give an anonymous example. In a paper about parsing noun phrases with a certain parser, it is written that a single day was spent on writing the slightly over 100 grammar rules in use. A number of source code examples garnish the publication and it seems to describe a mighty piece of software that performs well and is easy to operate. But this opinion can change rapidly once one starts using the software. The grammar parser is not safe enough to deal with missing semicolons. Sometimes it notices them, reporting an error 20 or 30 lines before or after, some other times it just ignores the issue and interprets something the grammar writer did not intend at all – without saying so.

I am sorry to say that this behavior – which is only one example of bad application behaviour – is shared by a number of applications I have been using so far. In all cases, documentation was sparse and I spent a number of days or weeks on trial and error procedures.

The Lack of Documentation

Why is there a lack of documentation of USES? Here are my hypotheses: science as a system rewards publications, may they be books or – probably even more important for most authors – papers accepted at conferences or by journal boards. In papers, people report about the great insights the gained. Of course, these insights were gained by employing USES. Which is alright. However, there are three things that are not rewarded by this system: 1) the free availability of the described software to other researchers, 2) the free availability of the data required for the described experiments, such as corpora, grammars, or other computational resources, 3) the existence and free availability of reasonably good documentation for the described software.

Authors should be encouraged to consider these three points. If they are not fulfilled, other researchers can neither confirm nor refute the results published in the corresponding papers. Which is, I hope not only to my opinion, what a large part of the science business should be about.

The Lack of Quality

With quality I refer to the engineering part of software: it must be stable, usable and not too complicated to install and maintain. I am not referring to the actual purpose of the software. The LaTeX typesetting system is a good example: its output is regarded to be some of the most properly typeset books and papers out there – but most people writing books and papers might find its programming-alike user interface simply not usable at all. Imagine Microsoft Word being »user-friendly« in the same way: it simply would not sell.

But why is this so hard? One possible answer: it takes time. Plenty of time. Designing a graphical user interface (GUI) is said to take 60% of a project's time and therefore costs. Now USES usually does not include GUIs, but every programmer knows how tedious it is to catch all errors and produce intelligent error messages to the user. Again, meaningful error messages and a good user interface are not rewarded in the world of publication-based science. They steal researchers' time, and so they better do without them.

The Lack of Design and Architecture

There is an even more important type of quality. The quality of the engine under the hood. The properties of those parts the car driver does not even know that they exist:

Knowledge in software design is a skill that many programmers in scientific business must do without. Even worse, it is regarded as a superfluos overhead of work. Its absence explains the lack of what software designers call the -ilities, denoting properties of code such as reusability, scalability, manageability, reliability, sustainability, …, Features one typically finds in enterprise software. Features that are not honored by the system of publications. An important side-effect of software design is that it allows the coordination of software development in a team. Without software design, this can be quite hard, depending on the size of the project and the team. Without a team, software becomes as idiosyncratic as USES tends to be.

One step further from software design, one finds software architecture. It is usually pattern-based. A pattern sketches a common problem and its solution. One could see it as a template solution or recipie to a given problem. These patterns are documented well. Using them can easy communication among developers. If a comment in the source code of a program reads »Using the Observer-Pattern here«, everybody with knowledge about software architecture does not need any further explanation on what is going on. This can simplify the development in a team or the takeover by a new maintainer of the project.

The Lack of Completeness and Maintainance

Most USES is either incomplete or many years old. If it has been written in rather machine-oriented programming languages such as C or C++, it is often hard or impossible to get USES running on an up-to-date operating system. Why is this? So far I bombarded the system of publications with criticism. But there is another issue: science often works project-based. A project proposal is written, hopefully a grant is given, and then the project is worked on. At some point, the project is over. This usually happens way before the USES product has reached a state of completeness. Researchers are then forced to move on to other projects and the old program lies there somewhere on the Web server, becomes old and grows gray hair and is rendered unusable by time bringing changes to computer platforms and file formats. Bugs are detected by users but they are not documented on a central bug tracking system and as the development period is over, nobody will ever fix them.

Outro

In this blog post I have describe the issues of software written by scientists. This is not to offend programmers out there, but the problem must be addressed. Good quality software is likely to quicken interest in your work in other researchers and students. It is likely to improve the gain of knowledge in computational scientific disciplines in general as it enables real reviews. Furthermore, good quality software has the potential of supporting good teaching instead of leaving students sitting madly frustrated in computer rooms.

One question remains: how can we reward people in science avoiding USES?


Graphics taken from Open Clip Art Library, modified by Niels Ott.

Addenda

  • 2008-12-05: Jochen Leidner pointed me to a readable article that discusses the same issue with a lot more analytic expertise. Read Empiricism is Not a Matter of Faith by Ted Pedersen.

13 Comments

Display comments as (Linear | Threaded)
  1. A radical open-source policy would be one way of avoiding most of these problems.

    "One question remains: how can we reward people in science avoiding USES?"

    By citing them frequently?
  2. No, open-source won't help much. A lot of collegeware is already open source - but since it's so highly specialized, no one dares tampering with it. Or cares, for that matter.

    I think that people should get more recognition for writing useful, interesting, and easy-to-use programs. But then again, that might just happen one day. I think that the best thinng we can do to help, is write great software...

    BTW. I pronounce USES as /useless/. Most of my experiences with software of this kind have been... aaaaaargh. Let's just say that I'd like those hours of my life *back*.
  3. Agreed. USES is awful and pervasive. I think people who write good, usable code are rewarded with notoriety, which is a large measure of academic success, is it not? Take Thorsten Joachims and SVMLight for example. It is actually usable, efficient, and robust. You can even write your own kernels really easily. And he's famous for it.

    But you're right: for the most part, the motivation is not there. Too bad.
  4. Aleksandar: Agreed on »useless«. I simply found myself unable to spent a day on bending acronyms, so I ended up with USES.

    In my opinion, making code open source can be a first step towards making the world a better place. As Ted Pedersen points out in the article I added, one must decide early to publish code in order to avoid legal issues, e.g. by using code snippets or libraries that conflict with open source licenses.

    What one always can do is to try to be a good example oneself, hoping that others might follow. This is why the »Software Projects« section of my web site here exists, even though it should be equipped with more code that I wrote in the past.
  5. You're right. Good craftsmanship with regard to software engineering is not honored in CL (or in CS, as I am sometimes told).

    I think the only way there's gonna be change is by influencing the coming generations of students, change the curricula etc.

    Just a couple of remarks:
    The fact that code is open source doesn't make it better, it just makes it obvious for other people that the code is crap. However, I wouldn't associate the -ilities with commercial software. Companies want to sell software, not create good software - and not all of them realise that writing good software in the first place might save money further down the road.

    Apart from that, I hate writing GUIs, mitigating all the one million ways users can screw up. Writing "behavior is undefined if a > b or c == 0" in a function docstring is so much more fulfilling.
  6. Ah, regarding GUIs, Torsten has found the words for the feelings I've been struggling to express ever since last Tuesday. :-) By the way, I'm also not a big fan of using GUIs. Linus Torvalds advocates KDE over GNOME on the grounds that in the former there's a GUI element for every thinkable configuration setting. For me, that's a negative... must be a matter of taste.

    By the way, a good way to go through USES hell is to enroll in CL at a French university and do the projects required for some seminars. I feel like I've spent an entire semester writing glue code. ;-)
  7. I'm clearly procrastinating writing more glue code, so here's another comment: Another minor problem is that often you have to use classic, but ancient software in certain places of your processing chain (CASS, Brill Tagger...), whose documentation, quality, architecture, and design may even have been impeccable at the time it was released... but today requires relatively rare skills to get to work. "The lack of recency", if you will.
  8. @ke: At some point, usually when some irreproducible evaluation experiment results in above 95% of F- or similar measure, a problem is regarded as being solved. Science then moves on to new horizons and leaves the robust implementation of real products to the industry. Which is the reason why there are so few POS taggers and morphological analyzers available for free and for up-to-date platforms. So we write yet another wrapper around TreeTagger and keep on dreaming of something smarter to seamlessly include as a library.

    Concerning user interfaces: some tools simply are stupid without GUIs. Displaying linguistic tree structures or AVMs without a GUI would be terrible. But then again, a GUI is not always a must. A good TUI should do the job in many situations. And with good I do not mean one saying thinks like "oops, signal 11 received, good bye" or "NullPointerException in WeirdClass, corrupting your data" but one with real error messages that make sense. As I wrote in the blog post, many people do not take the effort to even confirm to this small set of standards of well-behaved application behavior.
  9. "NullPointerException in WeirdClass, corrupting your data" -- haha, I recognize that.

    I remember really wishing http://casper.sf.net would get off the ground, but it seems to be stuck :-(

    -----

    I once tried implementing two parsers that were described in those 10-page articles for a college project, it took us about 3 months longer than expected due to all the details we had to fill in on our own (or make up, in cases where we couldn't analyze or calculate our way to the correct formulas).

    In the end, 3 months overdue, we got one of the parsers working at about 3/4 the accuracy that they reported. I still don't really know why it didn't perform as well, since so many details might be different.

    (We released our own source though ;-) , along with the implementational details)
  10. This article and the addendum are right on. I have spent forever reimplementing somebody's algorithm just to have it not work on my kind of data. And since they never released either their data or their code, there is no way to tell whether it is because of my coding error, different qualities of the data or something different yet again.

    However, just demanding Agile methodology from scientists is not likely to get anywhere fast. This is especially since even those who want to release code/data are afraid to do it before the publication, then forget to do it and then can't even remember how it works.

    There is a need for a most basic guidance. I think that should be code release, plus runnable examples including source and result data. That way one could download the package, make sure the examples can run correctly and then run the same configuration on his/her own data. Or even re-implement the code with the basic assurance that the packaged examples still produce the same results as with original code.

    Once we have that, we can ask for more. But it should be easy to start being good!
  11. Hi Niels, yours is a worthy rant I can subscribe to. I could add a lot of examples from my own experience; mostly obscure software. However, your rant is not exactly original. You have already added the link to Ted Pedersen's very entertaining and illuminating article which has a wider scope, so it is forgiveable that he does not cite previous work in software engineering that addresses exactly the issue you raise here. This body of research is easier to locate if you use the established terminology: what you call USES is either scientific software or research prototypes -- and I suspect that the problems you describe often arise because research prototypes are used as scientific software although they should not be considered a finished product.

    For literature on these issues, I would take a look at the “Empirical Studies of Software Development” research group at the Open University, UK, headed by Helen Sharp:
    http://www.springerlink.com/content/w214725153770u22/
    http://portal.acm.org/citation.cfm?id=1082983.1083117
  12. Actually, the stuff that I did for my thesis for a long time
    ran on a single computer
    used a bunch of data sources with mutually incompatible license requirements
    was a monolithic blob of modules
    relied on some C/C++ code that I just compiled manually (i.e., no makefile - someone who doesn't know the code would have to guess which programs to run at which point).
    Now - I've gotten much better and my parser (which shares some of the code with the
    anaphora resolution stuff) now uses Python's distutils, but still
    gcc 4.2 miscompiles part of it and I haven't found out why
    other people weren't successful in installing it, even though it's a one-line install in the case where it works
    it still relies on proprietary data sets (notably, SMOR, and a word clustering derived from a corpus that is only for internal use at another Uni)
    Now, people will say, that's because of the horrid mix of Python and C++ that you're using,
    and doing stuff for German with pure open source is doomed anyways.
    So, let's come to the last point, a 40KLoC project for coreference resolution (BART - see www.bart-coref.org). We're working on it with 3-4 people at the same time, sometimes doing different, independent research. But:
    a lot of time goes into refactoring the system so it doesn't degenerate into a messy blob
    (and that's with multiple people in there who actually know software engineering)
    since we're writing and rewriting different parts of the system, it occurs that some modification that's useful for Italian makes the performance for English worse, or breaks things, or makes it not work with JDK 1.5, or activates the hidden function to shoot deadly microwaves at your neighbour.

    So, in sum
    it's a lot of work
    not everyone can do it
    it doesn't get rewarded at all (I really mean that. And if you ever wondered if it helped to explicitly fund that sort of things - guess again, what will happen is that those people who don't really care about re-use will write nice-sounding proposals and then spend their time creating overengineered ISO standards that no one will ever implement or find useful. CES or LAF, anyone?)
    * surprisingly often, the approach is just doomed because you need proprietary component X anyway which you can't redistributed
  13. As Ted Pedersen points out, the publication of source code must be planned beforehand. So using proprietary software in a project means that one did not have the publication of the code in mind right from the start.

    It is of course true that the endeavor of releasing your code will face serious trouble in case you cannot do without components with incompatible licensing. But then again, if everybody would think like that, the world would never ever change. If you don't try, you can only fail.

    (Please use the preview function next time, your formatting turned out to be kind of… confusing.)

Add Comment


Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA 1CAPTCHA 2CAPTCHA 3CAPTCHA 4CAPTCHA 5