Refactoring, Use Cases and Abuse Cases
Sometimes code just happens. The example code is one such case. There were no formal requirements, I simply wanted to be able to receive Sysnews posts via text message, and Twitter was a low barrier means to achieve that goal. Once the proof of concept was realized however, the next logical step is to explore options for moving the code into production. This is a step where all too often the right precautions are not taken. Working code is not often, good code. Moving this particular service into production would have a number of "risks" that need to be addressed to ensure that other services are not degraded, and that adopters of the service to not encounter dissatisfying experiences.
The previous blog post listed a number of needs for the code. These are a form of requirements. Collecting requirements in some format is important for disciplined software development. It helps identify risks, forms the basis for success measuring criteria, and creates a foundation for communication. There are a wide variety of formats for gathering software requirements. As with many other agile methods, the process of codifying requirements is something that can be customized to the developer, organization, or project's needs.
One format of requirement that can be useful through the development life cycle is the Use Case. Use cases are descriptions of what the program should do in different situations. Depending on project needs these might be simple sentences outlining the primary functions, or they might be detailed descriptions of every feature. It is important to note that use cases are supposed to capture what the system should do, not how it should do it, unless that "how" is actually a functional requirement.
As one might imagine, the number of requirements (use cases) a system has will affect the complexity of the system and ultimately the development time. Every recorded requirement should be a genuine need, and should be distilled carefully so that the software meets the needs without wasting development time on unimportant features.
Use cases are often presented in a concise list, though sometimes more narrative prose is employed. Some use cases may specify prerequisites and post conditions. Applying Use Cases by Geri Schneider and Jason Winters is an excellent source for ideas in making the most of use cases. Another good guide to using use cases effectively is Patterns for Effective Use Cases.
Development teams may use specifications like UML to create visual diagrams of the processes described by use cases. Some tools can even help generate shell code from the data used to create such diagrams, but such tools can be costly and the usefulness of the results varies greatly depending on the project and team. UML diagrams are often not the way to go for small projects and small teams since the maintenance of the UML and the learning required to properly create and maintain them is cost prohibitive. Even in large projects the benefits of UML should be carefully weighed.
More often, simpler visualizations of the processes described in Use Cases may help illustrate concepts not easily captured in words. The key is to maximize communication effectiveness and reduce the overhead of creating and maintaining the uses cases. When visuals can do this, it makes sense to use them. Use cases and other forms of requirements are non-functional byproducts of the software development process. Working code is the focus, and use cases only help make this happen.
Use cases may use nested lists to breakdown special conditions, or represent branching conditions. It is important to keep use cases simple and easy to read, so nesting should be kept to a minimum. Instead use cases can reference or link to to other use cases to create the required connections.
As the name might imply, use cases are user focused. Each begins with a user making a request to a system (as mentioned before users and systems may be people or programs, both are agents). The scope of each use case needs to be kept clear, When the user makes a request to the system and there is a clear handoff where the system itself becomes the user of some other system, a second use case is needed. It is also important to keep the concept of users distinct from their role. In some situations the came agent might be performing two different roles. It is almost always important to capture these distinct roles, but it may not be important to note that the same agent performs both.
Verb phrases are the preferred label for use cases as it is the action or service being performed that is central to the unit. Each use case needs to clearly identify what the successful criteria are.
The root use case for the example program would look something like this:
Use Case 0
- Trigger - script call
- Preconditions - none
- System requests RDF news feed of current news items [Use Case 1].
- System requests RDF news feed of pending news items [Use Case 1].
- System merges the two feeds, excludes already tweeted items, and selects one unpublished item based on date for the item[Use Case 2].
- Pull the twitter status to make sure that the item has not already been twittered [Use Case 3].
- If pulling the twitter status fails, generate an error message to that extend [Use Case 4].
- If there is nothing unpublished, return a message saying so.
- Get a short URL for the unpublished item from Snipr [Use Case 5].
- If getting a Snipr URL isn't possible, generate an error message to that extend [Use Case 4].
- Truncate the title as needed such that the Snipr URL and title together total 130 characters or less. [Use Case 6]
- Attempt to tweet the unpublished item. [Use Case 7]
- If Tweeting fails, generate an error message to that extend [Use Case 4].
- It Tweeting succeeds remember that item as having been published and return a message listing what was tweeted, and any remaining unpublished items.
- Postconditions- some message has been returned.
In my own work I typically use very informal requirements documents with a Purpose section, Vision section, optional Background, optional Risks, and the rest are Use Cases. When there are functional requirements that are specify inappropriate for a Use Case format, such as specific validation algorithms that need to be codified, I append each special functional requirement. Other possible appendices as needed include some suggestions for requirements documents from Software Requirement Patterns (Withall 2007): Context, Scope, Major Assumptions, Major Exclusions, Key Business Entities, Infrastructures, Major Nonfunctional Capabilities.
The use case outlined above, Use Case 0, references seven other use cases which may need to be fleshed out or may even need to reference additional use cases. Use cases 1 and 2 are interrelated, 2 works on the products of 1. Here is where the discussion of patterns and refactoring from previous posts comes into play.
In Use Case 0, some agent initiates a script that, among many other things, needs to request and parse RDF documents to extract a list of news items. This alone is a good indicator that the process of handling RDF would benefit from having its own agent. By setting up Use Case 1 as a request to another agent, the main script no longer has to implement the details of RDF requests and parsing, hence a separation of concerns. This creates one less thing that the main script needs to implement. As an added benefit, there is a need to request and parse two different RDF files. If one agent can be designed to accept a request for any RDF file, parse it, and return the information in a more useful format then the case for making Use Case 1 its own agent is solidified.
RDF is a large and complex file format. Fortunately, we only need to implement a few things to get the script at hand working. Agile methods emphasize developing only what is needed. That is not to say that the scope of implementation and planning be short sighted. Various agile methods prescribe different levels of planning ahead for code, but most encourage developers to err on the side of avoiding analysis and design paralysis. A good design for an agent (in this case an object) to handle RDF is one that will allow us to handle only what we need without making it hard to add functionality later.
Use Case 1
- Trigger - Use Case 0
- Preconditions - A URL or file location to request the RDF from, or an RDF string.
- Get the RDF file from the specified location if needed.
- If it is a URL [Use Case 1.1]
- If it is a file location [Use Case 1.2]
- Extract a list of news items from the RDF XML.
- For each item, get the URL link, Title, Creator, and Date.
- Index the list by URL link, order by Date with most recent first.
- Postconditions- The list of desired information exists, even if it of size zero.
Use Case 2
- Trigger - Use Case 0
- Preconditions - One or more lists of news items, indexed by a unique and permanent URL with details on title, creator, and date. Another list of news items by URL which have been published.
- Generate a new list by excluding any already published items and any duplicates.
- Order the new list by date.
- The items with a date in the past go first from earliest (most recent) to latest.
- If any item happens to match the current date and time, it goes next,
- followed by items in the future ordered from soonest to farthest in the future.
- Postconditions- The list of desired information exists, even if it of size zero.
Starting from a fresh file, a wrapper index.php is a good way to create a script that can easily be adapted to a command line script. I also use index.php files as a wrapper to get around a number of quirks with trying to debug code in some environments. PHP is one of the best languages I have ever used for debugging because it is so flexible. In a wrapper index.php I typically start with code like:
$debugMode = true;
if($debugMode){
error_reporting(E_ALL);
ini_set('display_errors', 'On');
}
if($debugMode){
error_reporting(E_ALL);
ini_set('display_errors', 'On');
}
This creates an easy flood gate to turn on and off full error and warning messages. If your server environment does not allow using ini_set, which many won't, you may be able to find other ways to adjust PHP's error messages.
In the case of more involved scripts, I would typically do some additional boot strapping with a simple array of paths which could be configured differently for different hosts. I often use a localhost LAMP/WAMP/MAMP for developing and upload to a sandbox or production server. The array makes it easy to set some basic configurations that work on each possible host and they can be swapped automatically based on hostname, or manually with a variable much the way debug is turned on and off above. Once the paths for the host have been mapped I typically create a Zend_Loader to autoload object in the Zend Framework and in my own framework extensions and then load all other script configuring data through Zend_Config_Xml.
// Paths needed before install and/or before Zend_Config is available
// if your host is not included below, add a key in the $installHosts
// array that matches the host name of your web server (all lower case),
// and populate it with keys/values for any paths that need to be
// altered. You may delete entries for host names you won't use.
$installHosts = include_once('_config/installHosts.php');
$pathToZend = getHostSetting('pathToZend',$installHosts);
// Determine if Zend needs to be added to the path
$addZendPath = (isset($pathToZend)
&& $pathToZend != './'
&& $pathToZend != '.'
&& trim($pathToZend) != '');
$pathToZendExtentions = getHostSetting('pathToZendExtentions',$installHosts);
// Attempt to add Zend and/or Zend extentions to the path.
// On some server configs this will not be allowed. When that
// is the case, comment out these lines and have your server admin
// add the appropriate path(s).
ini_set('include_path', ini_get('include_path')
. ($addZendPath ? "{$pathToZend}:" : '')
. "{$pathToZendExtentions}:");
// Now that Zend is available, start the auto-loader.
require_once('Zend/Loader.php');
Zend_Loader::registerAutoload();
// if your host is not included below, add a key in the $installHosts
// array that matches the host name of your web server (all lower case),
// and populate it with keys/values for any paths that need to be
// altered. You may delete entries for host names you won't use.
$installHosts = include_once('_config/installHosts.php');
$pathToZend = getHostSetting('pathToZend',$installHosts);
// Determine if Zend needs to be added to the path
$addZendPath = (isset($pathToZend)
&& $pathToZend != './'
&& $pathToZend != '.'
&& trim($pathToZend) != '');
$pathToZendExtentions = getHostSetting('pathToZendExtentions',$installHosts);
// Attempt to add Zend and/or Zend extentions to the path.
// On some server configs this will not be allowed. When that
// is the case, comment out these lines and have your server admin
// add the appropriate path(s).
ini_set('include_path', ini_get('include_path')
. ($addZendPath ? "{$pathToZend}:" : '')
. "{$pathToZendExtentions}:");
// Now that Zend is available, start the auto-loader.
require_once('Zend/Loader.php');
Zend_Loader::registerAutoload();
A typical installHosts.php might look something like this:
/*
* This file is the ONLY FILE in the config folder that should be a
* .php array declaration. All others should be Zend_Config_Xml files.
*/
return(
array(
'default' => array(
'controller' => '/',
'documentRoot' => $_SERVER['DOCUMENT_ROOT'],
'configFileLocation' => "_config/{$_SERVER['HTTP_HOST']}.xml",
'defaultConfigFileLocation' => "_config/default.xml",
'pathToZend' => './', // assume it is already in path
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/extension/",
'jqueryPath' => '/include/jquery/jquery.js'
),
'production.lib.ncsu.edu' => array(
'controller' => '/journey/0.5.0/',
'pathToZend' => '/opt/coolstack/php5/lib/zend/library/',
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/journey/0.5.0/extension/"
),
'localhost'=>array(
'controller' => '/journey/0.5.0/',
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}journey/0.5.0/extension/"
)
)
);
/**
* Looks for a setting specific to the hostname to return,
* if it does not exist return the default.
* This is a quick and dirty option for anything that occurs before
* Zend_Autoloader allows us to use Zend_Config
*/
function getHostSetting($name, &$hostArray){
return(
(array_key_exists($_SERVER['HTTP_HOST'],$hostArray) && array_key_exists($name,$hostArray[$_SERVER['HTTP_HOST']]))
? $hostArray[$_SERVER['HTTP_HOST']][$name]
: $hostArray['default'][$name]
);
}
* This file is the ONLY FILE in the config folder that should be a
* .php array declaration. All others should be Zend_Config_Xml files.
*/
return(
array(
'default' => array(
'controller' => '/',
'documentRoot' => $_SERVER['DOCUMENT_ROOT'],
'configFileLocation' => "_config/{$_SERVER['HTTP_HOST']}.xml",
'defaultConfigFileLocation' => "_config/default.xml",
'pathToZend' => './', // assume it is already in path
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/extension/",
'jqueryPath' => '/include/jquery/jquery.js'
),
'production.lib.ncsu.edu' => array(
'controller' => '/journey/0.5.0/',
'pathToZend' => '/opt/coolstack/php5/lib/zend/library/',
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/journey/0.5.0/extension/"
),
'localhost'=>array(
'controller' => '/journey/0.5.0/',
'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}journey/0.5.0/extension/"
)
)
);
/**
* Looks for a setting specific to the hostname to return,
* if it does not exist return the default.
* This is a quick and dirty option for anything that occurs before
* Zend_Autoloader allows us to use Zend_Config
*/
function getHostSetting($name, &$hostArray){
return(
(array_key_exists($_SERVER['HTTP_HOST'],$hostArray) && array_key_exists($name,$hostArray[$_SERVER['HTTP_HOST']]))
? $hostArray[$_SERVER['HTTP_HOST']][$name]
: $hostArray['default'][$name]
);
}
The configuration of this particular setup is just an example. The important thing is that programs be written so they are portable, and boot strappers like the above example help reduce the need for mixing configuration concerns in the code.
Back in the index.php file, we'll need to call the actual script:
//Now for our reguarly scheduled program.
print(include('twitter.php'));
Obviously, twitter.php doesn't exist yet. A Shell for it based on Use Case 0 would look something like this:print(include('twitter.php'));
$currentRdfUrl = 'http://sysnews.ncsu.edu/news/index.rdf';
$pendingRdfUrl = 'http://sysnews.ncsu.edu/news/events.rdf';
$message = '';
// System requests RDF news feed of current news items [Use Case 1].
$currentNews = new TwitteRdf_Rdf_Extracter($currentRdfUrl);
// System requests RDF news feed of pending news items [Use Case 1].
$pendingNews = new TwitteRdf_Rdf_Extracter($pendingRdfUrl);
// System merges the two feeds, excludes already tweeted items, and selects one unpublished item based on date for the item[Use Case 2].
$newsList = array_merge($currentNews->toArray(), $pendingNews->toArray());
$publishedItems = unserialize(file_get_contents('_cache/TwitteRCache.pser'));
$potentialTweetItems = array_diff_ukey($newsList,$publishedItems);
// Pull the twitter status to make sure that the item has not already been twittered [Use Case 3].
// If pulling the twitter status fails, generate an error message to that extend [Use Case 4].
// If there is nothing unpublished, return a message saying so.
// Get a short URL for the unpublished item from Snipr [Use Case 5].
// If getting a Snipr URL isn't possible, generate an error message to that extend [Use Case 4].
// Truncate the title as needed such that the Snipr URL and title together total 130 characters or less. [Use Case 6]
// Attempt to tweet the unpublished item. [Use Case 7]
// If Tweeting fails, generate an error message to that extend [Use Case 4].
// It Tweeting succeeds remember that item as having been published and return a message listing what was tweeted, and any remaining unpublished items.
// Postconditions- some message has been returned.
return $message;
$pendingRdfUrl = 'http://sysnews.ncsu.edu/news/events.rdf';
$message = '';
// System requests RDF news feed of current news items [Use Case 1].
$currentNews = new TwitteRdf_Rdf_Extracter($currentRdfUrl);
// System requests RDF news feed of pending news items [Use Case 1].
$pendingNews = new TwitteRdf_Rdf_Extracter($pendingRdfUrl);
// System merges the two feeds, excludes already tweeted items, and selects one unpublished item based on date for the item[Use Case 2].
$newsList = array_merge($currentNews->toArray(), $pendingNews->toArray());
$publishedItems = unserialize(file_get_contents('_cache/TwitteRCache.pser'));
$potentialTweetItems = array_diff_ukey($newsList,$publishedItems);
// Pull the twitter status to make sure that the item has not already been twittered [Use Case 3].
// If pulling the twitter status fails, generate an error message to that extend [Use Case 4].
// If there is nothing unpublished, return a message saying so.
// Get a short URL for the unpublished item from Snipr [Use Case 5].
// If getting a Snipr URL isn't possible, generate an error message to that extend [Use Case 4].
// Truncate the title as needed such that the Snipr URL and title together total 130 characters or less. [Use Case 6]
// Attempt to tweet the unpublished item. [Use Case 7]
// If Tweeting fails, generate an error message to that extend [Use Case 4].
// It Tweeting succeeds remember that item as having been published and return a message listing what was tweeted, and any remaining unpublished items.
// Postconditions- some message has been returned.
return $message;
Clearly the above code won't work. We have not declared the class TwitteRdf_Rdf_Extracter yet. Hopefully it is clear now that the refactored code will be much easier to read than the old code.
One final topic for this post: Abuse Cases. Most development teams focus the use case collection process on desired functionality. One thing I like to do is ask people, "What's the worst thing you can imagine going wrong with the system". These are abuse cases, when one agent does something undesirable that causes major problems for the other agents. As has been already mentioned, one major reason for refactoring this code was because recently, Twitter went through a period of a few hours where it returned a 500 server error, but still posted messages. If a program like this encountered such a problem, it might easily continue to post the same item over and over again.
Catching all abuse cases is hard if not impossible, but it is very important to ask the people you collect use cases from both, "What should the system do?" and "What should the system not do?". One of the most important agile principles is communication, and involvement of all stakeholders. Capturing the right abuse cases can significantly reduce the risks in any project.