Refactoring, Patterns and Other Important Considerations

One of the biggest advances in software engineering didn't originate from a programmer, but from a architect. Christopher Alexander didn't design code, he designed buildings. None the less his ideas on effective design are the underpinning of modern software development as popularized by the "gang of four" publication: Design Patterns: Elements of Reusable Object-oriented software. While an exploration of the patterns introduced in that book is beyond the scope of this article, the important thing to know is that patterns enhance communication and learning actives in the software development process.

One major challenge in communication is shared understanding. A great deal of confusion can occur when two individuals have different mental models of what they otherwise believe to be the same thing. The two primary strategies in object-oriented software development for avoiding this issue are:
  1. Interface- a carefully and explicitly negotiated contract for interaction. Interfaces create mutually understood means for ensuring predicable behavior.
  2. Abstraction- the practice of obscuring details that are not essential to simplify communication. This is often called separation of concerns because abstraction removes the need for everyone to know everything.
Patterns are a way to codify a relatively small set of the most commonly needed solutions in a discipline in a way that identifies opportunities for application, outlines successful strategies, and focuses on learning by example and application. Patterns fit well into agile methods because they enable individuals with low expertise to benefit from a wide body of professional knowledge. As individuals gain confidence and skill they can continue to apply what they have learned from patterns  creatively. At more advanced levels individuals can introduce new patterns and update or enhance existing patterns with the knowledge and experience they have gained. Through this spectrum of expertise, patterns help individuals at different levels communicate with each other.

One core pattern many software applications use is the Model-View-Controller (MVC) triplet. In this pattern the idea is to separate three major concerns: how data is may be manipulated, how data is presented, and how the user may interact with the data (respectively). Another way to think of the components is to consider the model to be a gate keeper for business logic, the view to be a designer that presents information, and the controller that takes input from the user and drives interaction with the system.

There are a variety of ways to implement MVC, some more disciplined and others more light weight. In the example code there is virtually no user interaction at all. The only point of interaction for the "user" is the decision to run the code and the response returned by the code. These are both still considered user interaction points regardless of whether the code is run by a human user, or called by an automated process such as a cron job.

The code does interact with other systems, three to be exact. It makes a request via HTTP to a web server from which it acquires an RDF summary of NC State University's "Sysnews". This RDF document is a summary of news articles published through a web site. After it has processed the RDF it selects one article and makes an HTTP request to a web service called Snipr to acquire a short URL with which it can publish the selected article. The third request sends a short 140 character or less string of text to a web service called Twitter. Twitter then performs a wide variety of tasks, including publishing the short message to a web page and RSS feed, as well as sending out instant messenger and text message notifications to any subscribed users. In all three of these cases the example code is itself a "user" for other programs. It initiates a request and receives some information in return.

The term "user" can mean a variety of things in different software development circles. Throughout this blog it is used to mean an "agent", human or otherwise, that initiates some interaction. Generally speaking "system" will also be used to mean an "agent" that responds to some request for interaction. The true value of the MVC pattern is that it leads to systems that can easily be adapted to use by any form of agent. Careful application of MVC can also lead to programs that can be users for "human systems".

Some developers have a very specific view of MVC, they either love it because they have had positive experiences with MVC frameworks such as Ruby on Rails, or they dislike it because they have had negative experiences trying to use MVC frameworks to do things the framework was not designed to do. It is important to distinguish between MVC as a software pattern, and MVC frameworks. Both are useful in certain situations, but they have different applications despite the fact that they are based on the same concept. MVC frameworks are a particular way to do MVC, by necessity they have a set of features that can be both empowering and limiting. MVC frameworks are for the most part a system, they do actual work. In contrast the MVC pattern is more generally applicable, but since it is not a system it does not accomplish any real work. The MVC pattern is a concept, not a program.

In learning to use patterns effectively it is important to set aside technology prejudices and carefully consider the underlying ideas that comprise the patterns. Just because one MVC framework does not do what is needed, or does not do it effectively does not mean that MVC is the wrong pattern to apply. In the example code it might seem that MVC is not very applicable because there is a low level of interaction. The truth of the matter is most reusable code must consider the principles embodied by the MVC pattern, because at the root MVC is about separation of concerns virtually all programs have: manipulation of data, presentation of the data, and interaction with the user.

Another pair of patterns that are commonly used are the Singleton and Utility patterns. Both rely on static class definitions, which is a feature of object oriented programming where a class (type or object) has properties and methods that exist separately of any object instantiation. Static properties and methods are also shared by all instantiations of that class. What both patterns have in common is that they have a private constructor, thus an object of that class cannot be created in the same way as most other classes.

In most languages objects are normally created with a "new" operator. Instead, a singleton class can only be instantiated by calling a static method and that method returns a new object the first time it is called, but each subsequent time it is called it will always return that same object. In some languages there may be other operators that must be altered for a singleton object so that the one instantiation (the only instantiation) can never be copied.

The utility pattern on the other hand is used for creating a class that may never be instantiated. This is most commonly used to create stateless objects that provide a library of commonly used functions without the need to use functions (which are methods not associated with any class).

The functional difference in these two classes is that a singleton may have state values associated with it and thus effectively ensures there is at most one instantiation of that class at any given time. The utility pattern is intended to be stateless, so it in effect never has any instantiation.

There are a variety of other patterns for controlling the creation of objects including several variations on the Factory and Builder patterns. These won't be needed in the example code, are more complex, and the implementation of these patterns is highly dependent on the language being used. The theme to keep in mind is that some patterns are focused on control over the creation and lifespan of objects in a program.

Other classifications of patterns include: structural, behavioral, and concurrency. The previously mentioned MVC patterns are commonly considered to be an "architectural" classification.

Structural patterns that will be used in refactoring the example code include the Adapter pattern and the Decorator pattern. Behavioral patterns that will be used in refactoring are the Memento pattern and the Interpreter pattern.

Looking at the example program and the description of what it does, several different concerns come to mind. The program needs to:
  • know how to read an RDF file
  • be able to pick out the information it needs from the RDF file
  • able to get the RDF file from the web server
  • decide which news item to "Twitter"
  • remember which news item(s) it has already "Twittered" so that it never twitters the same item twice
  • report back what it did (success, fail,  and if there are more items to twitter).
  • be able to acquire a short URL for the article from Snipr
  • be able to "Twitter"
Some additional concerns include things the program does not do, but could or should:
  • verify previous successes and failures independently of the statuses reported back. This is needed because Twitter suffers occasional problems with false negatives and it is important to not post the same thing twice as this may "annoy" people subscribed to IM and text message notifications. Similarly, if it is repeatedly discovered that a false positive has occurred, it would be prudent to retry since these messages are sometimes of a critical nature.
  • detect a failure in Snipr. This has not occurred, but some simple error checks could prevent future problems.
  • the program should behave differently if called directly from a web browser or called by running PHP on the command line, primarily it should not spew HTML out to the terminal but instead use a simpler plain text response.
  • allow some basic configuration of the script for greater portability and reuse, including the ability to handle multiple RDF sources with some basic rules for establishing priority and frequency of Twittering
  • put into place abstractions needed to allow later versions to use sources other than RDF (such as RSS/Atom)
  • only Twitter at certain times of the day
  • notify someone via Twitter and/or e-mail when something is wrong
  • some basic management of the program should be achievable, primarily the ability to manually tell it that a news item failed and to retry or manually insert a news item.
Each of these concerns indicates the opportunity to apply a pattern. The practice of doing that will be explored in the next post: Separating Concerns.

Some other considerations that need to be made with respect to this program are:
  • Validation- there are a number of places where input checks will be used to ensure reliability and safety.
  • Declarative Code- by separating concerns the code will have objects that do more declarative work, and less procedural work.
  • Removing duplications of code.
  • Use Cases
Use Cases are one tool often employed by agile programmers of various flavors to capture requirements in the form of how the program should behave in response to different "agents". Many practitioners of agile limit "use cases" to human triggered interaction and while these are of particular interest, I feel that is too limited a view of any system. In my own particular model of capturing requirements, all points of interaction (users and systems as mentioned above) involve agents and so all desirable behaviors can be captured in use cases. I also like to document "Abuse Cases" that is foreseeable agent actions (intentional, unintentional, human, or otherwise) that could have a significantly disastrous effect on the behavior of the program.

Comments [0]

Trackback URL: http://blogs.lib.ncsu.edu/fabulousit/entry/refactoring_patterns_and_other_important
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: Allowed