Pros and Cons of Development With Zend Framework

While the definition of "Framework" may not be understood or agreed upon in some programming circles, the Inversion of Control distinction can help clarify the difference between a "Library" and a "Framework".

Generally speaking, a Library is a reusable collection of code that can be used by a controlling program.  The library implements specialized behavior, image libraries for example perform tasks to generate or manipulate images. The use of a library frees programmers from having to deal with some complexities, abstracting complex problems of a specific type. The programmer using a library writes the code that drives interaction, applying business logic to decide what to do. A library is essentially a service for such programs.

In contrast, a Framework is a system that takes programs implementing specialized behaviors, and uses them to drive interaction. The framework acts as a reusable shell which abstracts common needs of a type of program, such as accepting input and generating output. The framework delegates control to the code written by the programmer in predetermined ways, but it is the framework that is ultimately in charge.

Frameworks are useful in making it easier to write similar types of programs that have different business logic:
  • a web form to accept a bank transaction, and
  • a web form to schedule an event might both be implemented easily with the same framework.
A library on the other hand is useful in making it easier to accomplish a collection of related tasks:
  • one library might be used to save and retrive information from a database, while
  • another library might be use to generate the HTML of a web page.
In both cases the code written by the programmer affects side effects and outputs. In the case of a library, the library may generate side effects and outputs, but only as allowed by the controlling program. In the case of a framework, the framework always has control at some level of what the side effects and outputs are even when the programmer's code that generates them.

To make matters complicated, many libraries perform tasks similar to a framework in that the library may be given control of large portions of the program's behavior. Frameworks also often include library like functions that are driven by the code written by the programmer while it is in turn being driven by the framework.

The key distinction is that of control, but the as one might imagine there are reasonable cases where control goes back and forth making the distinction difficult. There are proponents that will argue that one or either of the two models should be used as purely as possible, and others that will insist that it is function and not theory that should drive such decisions. Both opinions have merit.

A good framework can save time in development and maintenance efforts. Unlike a library, where there is nothing dictating the design of the program being written, a framework provides an outer layer of standardization. When a programmer uses a library, the library dictates the interfaces by which it may be used. Frameworks on the other hand dictate the interface of the program being written. They must conform to an interface that the framework expects.

The most attractive part of the Zend Framework for the development group I am a part of is that it is flexible. It can be used as a library, or a framework (or as one might imagine both at the same time). In my particular case it has become more of a library by which I am writing my own framework.

Zend Framework implements a MVC (Model-View-Controller) pattern as the primary framework components. Most of the Zend Framework documentation and training is unfortunately written from the assumption that one is using the MVC components. Aside from this issue I and (most if not all) the other developers I work with have found it to be very flexible and powerful whether you use MVC or not.

One of the real tests of this flexibility came this week when I realized the HTML Forms being rendered by the Zend Framework Form component were not to my liking. It is programmed to always insert a "value=" attribute on form input tags, even when the value is blank or undefined. This causes most major browsers to have an undesirable behavior.

For an input tag such as:

<input name="myText" id="myText" value="" type="text"/>

The web browser will automatically set the text field generated by the tag to be blank when the form is first loaded into the browser, which is the same behavior as:

<input name="myText" id="myText" type="text"/>

The key distinction occurs if a user enters something in the text field, submits, and then decides to use the back button. In the case where there is a blank value attribute, the browser will again display a blank text box, even if something had been entered previously. In the case where there is no value attribute at all however, many browsers will repopulate the text field with the text the user had entered prior to submitting the form.

Whether the behavior or repopulating the form input element is a standards compliant behavior, or even good from a usability stand point is beside the point. What matters to me is that users can and will use the back button, and they will be annoyed if they have to fill in the form from scratch. Some of them will have likely used a form in the past where using the back button did not clear the form and will expect what they consider to be the most user friendly result. It is certainly possible to solve the issue with PHP Sessions, lessening the dependence on browser behavior. Implementing such a solution properly however requires a level of business logic that I am not interested in tackling at this time if browser behavior will acquiesce to meet the need.

In the Zend Framework, there are many many layers to the implementation of rendering a form. A non-exhaustive list includes:
  • The Zend_Form object requires a Zend_View object to render, but any instance of Zend_View will do, even one with an empty constructor:
    $view = new Zend_View();
  • The Zend_Form object is given a number of subforms, element groups, and elements (Zend_Form_Element).
  • The Zend_Form_Element(s) each have filters, and validators for the value(s) entered into the form element.
  • The Zend_Form object, it's subforms, element groups, and elements each have one or more decorators.
  • The Zend_View, uses helpers and decorators to decide how to generate output.
The obvious down side to using a framework like Zend Framework is the level of complexity. Many libraries that are well designed and properly documented have a relatively small interface "surface area", meaning there are a few easy to learn ways to interact with the library. In contrast a powerful, flexible framework might easily have a very large interface "surface area", much like the wrinkles in a highly evolved brain.

After some API and code digging (because the documentation is a bit lacking) I discovered that the Zend_View_Helper_FormText object was responsible for the undesirable behavior. The trick to producing a more desirable outcome was to create an alternative class which I added to my own code base.

The code for Zend_View_Helper_FormText is as follows. The undesirable behavior has been highlighted:

<?php
/**
 * ...
 */


/**
 * Abstract class for extension
 */
require_once 'Zend/View/Helper/FormElement.php';


/**
 * Helper to generate a "text" element
 *
 * ...
 */
class Zend_View_Helper_FormText extends Zend_View_Helper_FormElement
{
    /**
     * ...
     */
    public function formText($name, $value = null, $attribs = null)
    {
        $info = $this->_getInfo($name, $value, $attribs);
        extract($info); // name, value, attribs, options, listsep, disable

        // build the element
        $disabled = '';
        if ($disable) {
            // disabled
            $disabled = ' disabled="disabled"';
        }
       
        // XHTML or HTML end tag?
        $endTag = ' />';
        if (($this->view instanceof Zend_View_Abstract) && !$this->view->doctype()->isXhtml()) {
            $endTag= '>';
        }

        $xhtml = '<input type="text"'
                . ' name="' . $this->view->escape($name) . '"'
                . ' id="' . $this->view->escape($id) . '"'
                . ' value="' . $this->view->escape($value) . '"'
                . $disabled
                . $this->_htmlAttribs($attribs)
                . $endTag;

        return $xhtml;
    }
}

The replacement class is as follows. The changes have been highlighted:

<?php
/**
 * Quepie Framework
 *
 * ...
 */


/**
 * Abstract class for extension
 */
require_once 'Zend/View/Helper/FormElement.php';


/**
 * ...
 */
class Quepie_View_Helper_FormText extends Zend_View_Helper_FormElement{
    /**
     * ...
     */
    public function formText($name, $value = null, $attribs = null)
    {
        $info = $this->_getInfo($name, $value, $attribs);
        extract($info); // name, value, attribs, options, listsep, disable

        // build the element
        $disabled = '';
        if ($disable) {
            // disabled
            $disabled = ' disabled="disabled"';
        }
       
        // XHTML or HTML end tag?
        $endTag = ' />';
        if (($this->view instanceof Zend_View_Abstract) && !$this->view->doctype()->isXhtml()) {
            $endTag= '>';
        }

        $xhtml = '<input type="text"'
                . ' name="' . $this->view->escape($name) . '"'
                . ' id="' . $this->view->escape($id) . '"'
                . ($this->view->escape($value) != '' ? ' value="' . $this->view->escape($value) . '"' : '')
                . $disabled
                . $this->_htmlAttribs($attribs)
                . $endTag;

        return $xhtml;
    }
}

So that the new class behaves identically, except it only outputs the value attribute if the value is not an empty string. Code that executes in the framework prior to the above code filters an undefined value into an empty string, so testing only for the empty string should be a sufficiently safe assumption.

Now the remaining trick is figuring out how to make Zend Framework use this new helper instead of the default helper. Fortunately that is also easy to accomplish in a line of code or two. A working form must already have a view object, as it turns out there is a function that will prepend an alternative package to the view's search list for helpers:

$view = new Zend_View();

$view->addHelperPath('Quepie/View/Helper/', 'Quepie_View_Helper_');

$form->setView($view);

With the above code the forms which used to output "bad" HTML now output as desired. It still used the default helpers for all other form elements, but uses the new helper for text input elements instead of the one provided by the Zend Framework.

Selecting a good framework, or deciding if a framework is even right for your development team requires an understanding of needs and tolerances. In this case the Zend Framework proved to be sufficiently flexible, and even with minimal code effort. The time investment for learning the Zend Framework was considerable, figuring out what to do took much longer than actually doing it. In our group, the need for flexibility while still working from a common code base is high enough that the demands for learning are tolerable. We also have just enough developers and strong communication among group members to help offset the demands of learning. If we had smaller numbers, or lower communication the strain of learning to use the framework might not outweigh the gained benefits.

Comments [0]

Yahoo Developer Network's "Best Practices for Speeding Up Your Web Site"

We've been batting around discussions between our two development groups for a while about ways to improve processes. It is no secret I am very interested in becoming more Agile, and some of my other co-workers are also looking into Agile methods. There has even been some inquiry into how we might achieve better documentation. In all, we are a pretty motivated group.

The co-worker across the hall that turned me on to PHPdoc mentioned "Yahoo best practices" today in passing, mentioning that he and someone else in his group had been looking at, so I asked him to send me the link. It looked good at first, but then when I clicked on the first link that caught my eye I was terribly disappointed.

From a User Experince stand point is always interesting to me to catch "initial impressions", so this blog post is my initial impression of Yahoo!'s Exceptional Performance Team's Best Practices:

1. Make Fewer Requests

Yes, but aside from the idea of "CSS Sprites" which is easy for me (though it might be tricky for others), this is perhaps a very unhelpful recommendation. Image Maps ARE NOT a good idea, I don't even know why they would mention such a thing when they admit it is not an accessible (and therefore a bad) solution. Inline image data is a bit too pie-in-the-sky to be helpful...

So as a design philosophy, designing with this principle in mind IS a good idea, but definitely a challenge. Certainly not a good candidate for the #1 spot.

2. Use a Content Delivery Network

Sure, if you're Google or Yahoo... how about an idea that most of us can use? Why is this #2? It really seems like something that might do a lot of good for about 2% of the people actually reading the recommendation.

3. Add an Expires or a Cache-Control Header

Yes, THIS is a good recommendation. I'm glad someone is willing to explain to me how to properly use HTTP headers so I don't have to dig through the overly complicated RFCs to try and understand what they are talking about. Thumbs Up.

4. Gzip Components

Yes. A little technical for some people to implement, especially to implement correctly, but great suggestion none-the-less.

5. Put Style Sheets at the Top

Yes, Thank You! Can we PLEASE try to follow the specification? I realize that some situations make this less than ideal, such as sites where users edit content in WYSIWYG inline editors and the only way to affect style is with inline code... but these are clearly an outliers. There are way too many offenders in this category. It's like the people that didn't want to give up table layouts are passive-aggressively trying to sabotage CSS any way they can...

6. Put Scripts at the Bottom

This is at least interesting to learn about. Perhaps problematic to actually follow, but definitely good to keep in mind during the design process.

7. Avoid CSS Expressions

I'm insulted. This isn't even part of the standard. Only a fool would use it (sorry if I offend any fools). It is flat out unnecessary to use them, and I really don't see why this would even needs to be considered a "best practice". If you even make a meager attempt at standards compliance then it is a given.

8. Make JavaScript and CSS External

Fair, Balanced, Accurate (unlike the news station that uses that as their "slogan"). I think this recommendation does a good job of pointing out why you might occasionally want to make an exception to what is other wise a good rule of thumb.

9. Reduce DNS Lookups

Good information to know... but also another one of the recommendations that really doesn't seem like it is applicable in the grand majority of cases.

10. Minify Javascript and CSS

Yes, PLZTHX, Next. This tip isn't really all that surprising, less is better.

11. Avoid Redirects

I sense a trend of being not too profound...

12. Remove Duplicate Scripts

... SRSLY?

13. Configure ETags

Good to know, useless to most people. This is another highly technical recommendation that requires control at the header level.

14. Make Ajax Cacheable

Fair enough, I suppose some people don't think about things like that...

15. Flush the Buffer Early

Something I don't do enough, but then I never really know what the sever admin is going to set the buffer settings to tomorrow or the next day. A good courtesy flush() never hurts! (sorry, had to)

16. Use GET for AJAX Requests

Now HERE, is some useful information for me. Something (at least) I didn't know about the difference between GET and POST.

17. Post-load components

This seems like it would have been better combined with #6. Certainly makes the design considerations easier. Is 34 a magic number? If not, this would have been much better off as part of #6.

18. Preload Components

Did this a LONG time ago with roll-over images (pre-loaded commonly used roll-overs for second level pages on the homepage).... man were we stupid back in the Web 1.0 days!

This approach sounds reasonable, but I fear it has the potential to be heavily abused.

19. Reduce the Number of DOM Elements

I guess there are some people that somehow fooled themselves into thinking this was inobvious. *cough* denial */cough*.

20. Split Components Across Domains

I've seen sites doing this...was confused but now at least I understand why... still something that most of us would never really get much leverage out of. This sounds like more of a reverse bandwidth sucking cheat to me. If you feel the need to do this... maybe there is something ELSE you can do to improve performance instead of hacking the HTTP recommendation?

21. Minimize the Number of Iframes

Uh, how about 0 (zero)?! Honestly though, I do use Iframes...when I REALLY have to (Like getting dynamic content on www4 sans-JavaScript).

22. No 404s

This is what initially threw me aback. Huh? I can't control someone typing in the wrong URL. Sorry, but trying to act like pilot error doesn't happen isn't a "best practice". I do like the way this recommendation touches on the pros and cons of "smart 404" pages that try to help the visitor find something, but aside from reminding developers to make sure their links (including style sheets and scripts) all go to valid URLs.... This recommendation is about as pointless as it gets.

23. Reduce Cookie Size

Is called "session". I use.

24. Use Cookie Free Domains

Yet another neat trick that most people _won't use_.

25. Minimize DOM Access

Three good tips in one! Is it my birthday, or are we having a sale?

26. Develop Smart Event Handlers

Another good tip (or is that two cleverly disguised as one?).

27. Choose <link> over @import

On a roll finally, If I cared about IE users. They should be punished. Regardless, this is REALLY good information to know, only problem is deciding whether to use it for good, or evil...

28. Avoid Filters

Back to Duh. Good information if I go to hell and have to develop sites for IE 5 or IE 6 users.

29. Optimize Images

Thanks Mr. Wizard :D

Sarcasm aside, it is good to have someone actually recommend to proper tools for doing this. There are some people that DON'T know how to optimize images.

30. Optimize CSS Sprites

Before I read this, I was prepared for the worst. This is actually GOOD information to have though. Orientation of components of sprites vertically is more efficient and some other useful tips.

31. Don't Scale Images in HTML

OK Daddy. I'll stop using the <blink> tag too. Pinky swear. There are legitimate cases when you _do_ want to do this. If you link to a picture only a little bigger than the space you have for a thumbnail, and don't wan't to waste space/resources to make a thuumbnail, this isn't a bad practice in cases where there are few pictures and the visitor is almost certain to click on them all. Pretty rare use case, but important to consider.

32. Make favicon.ico Small and Cachable

Good recommendations and considerations for something I'd really rather not have to deal with, but now I know I probably should anyway. Apparently it is better to have a really small icon than to not have one at all, and forcing a long cache can help performance.

33. Keep Components Under 25K

Wow, I hadn't really thought about that. But probably something really important to consider. Seems like a pretty extreme case, but good to know. Brings up a good point on the difference between compression and minification.

34. Pack Components Into a Multipart Document

Not very well explained...since this is a feature "some user agents don't support", it sounds pretty iffy.

So In Summary

I have rather mixed feelings about this particular document. I was frankly expecting more, the guy that recommended this is pretty spot on with ideas, but I'm rather unimpressed with this tidbit.

I went back and rated every recommendation with a score between 0 and 3 depending on how useful I thought it would be to "most" developers. I ended up not giving a 0 to anything,  I gave 3's to several recommendations that I made rather sarcastic comments to because let's face it, table based layouts are STILL BEING MADE TODAY AS WE SPEAK!

Out of a total possible 102 points, I awarded 83 points.... that's like a B on the 10 point scale and a C on the 7 point scale we used when I was in grade school...

twenty 3's, nine 2's, five 1's

Then again. This IS Yahoo we are talking about. I am seriously beginning to have second thought about integrating the YUI Calendar component into a recent project... It wasn't a terribly pleasant experience to start with.


Comments [0]

Refactoring, Declarative Coding

Tradition holds that a picture is worth a thousand words. A config file may well be worth a thousand lines of code (or easily more). Reconfigurable code can simplify maintenance, portability, reuse, and even development.

When many novice programmers think of objects, they think of objects exhibiting behavior determined at compile time: hard coded, internal logic. To a large sense this is accurate, programs only do what they are written to do. Computer programs operate on functions, they take input and generate output. In a sense objects are little different since each is primarily a collection of functions, often with some internal state. The behavior of an object is based on the interplay between the functions (called methods) the input to those functions, and the object's state (called properties).

An opportunity that is often underutilized is the ability to determine behavior at runtime. Object methods, including the methods that create and destroy the objects accept input. Often the input is limited to a small set of variables, making use of the object easier. For some objects this is highly appropriate. In other cases there is a missed opportunity. Take for example an object that has a list of items. In the first version of a program, the object only needs to sort the items in one way, so the sorting function might logically be a method of the object.

In this scenario the sorting behavior is predetermined (at compile time by the programmer) and cannot be changed at runtime (while the program is running). This approach is effective in building the first version of the program because it is simple to implement. Suppose in the second version a couple of other sorting options are needed.

From this point the development can go several different ways. One logical approach might be to create separate sorting methods within the same object. If one instance of the object might have need for any of the sorting methods they might simply be public methods (methods other objects can use to retrive or trigger the sorting). If each object only needs one sorting method each, it might be chosen by setting some internal state when constructing the object.

Another logical, and perhaps cleaner, approach in the case where each object only needs one of the sorting methods is to create a separate class for each sorting method. They would share a common class ancestry (they would be polymorphic) and each would only differ in the declaration of the sorting method.

This method does prove useful, as long as the object containing the sorting method is the only object that needs to use that sorting method. If in the third version a new type of object with another list is introduced, and that object needs some of the same sorting methods as the original object then the options above yield inefficient implementations.

The sorting method is procedural code, it describes how to do something. Creating two copies of this code is problematic, if the procedures ever fall out of synch in the process of maintenance or adding new features then problems will arise. The sorting method can be defined as an object in and of itself, it may even be a stateless object, or a class that is not intended to be instantiated into an object at all. Purely stateless objects, typically those that fall into the category of Utility patterns, help create reusable procedural code. When the reusable procedures need some internal state they are often categorized as Strategy patterns.

Designing the sorting as a separate object using the Utility or Strategy patterns allows it to be shared among multiple types of objects with sorting needs. This approach is especially effective when care is taken to make each different sorting object polymorphic (they all share the same interface). Now each object that has different sorting needs can be assigned a sorting object partner during or after creation instead of requiring a separate class for each sorting method. This approach focuses more on runtime control of the sorting behavior for the object needing to sort. The object is more "configurable", and thus it's capabilities at runtime are more flexible.

The process of assigning behavior at runtime relies on "declarative coding".  By creating objects that are configurable, there is now a requirement to specify configuration each time the object is used unless there is some desirable default behavior (sometimes, a good idea). Some input now determines which behavior will be exhibited. The behavior options are still predetermined withing the bounds of what has been programmed, but how and when they will be used is no longer as fixed.

Going back to the original example, the programmer can define a number of sorting objects, and any object that has a sortable list can be assigned any of these by the programmer, or by the person using the program as business logic allows. The programmer might predetermine one of more of the sorting objects in the code of the program, or use a configuration file to control the options each time the program is executed. The programmer also might allow the user to specify which sorting option to use as an input to running the program (a command line options), or the programmer might let the user change the sorting method dynamically as the program is running through some user interface option. The program itself might select the sorting method based on the data being examined, selecting the appropriate sort automatically.

To be clear, declarative coding does not remove the need for procedural code. It simply isolated procedural code (separation of concerns) to a very specific scope in the program. It then allows objects to determine their behavior more declaratively,

"I am a Foo, and I use a Bar."

instead of,

"I am a Foo, and this is how I Bar..."

In an ideal scenario this might drastically reduce the amount of procedural code through efficiency.

So going back to the example code the Use Cases reveal some opportunities to implement code declaratively. This approach will save little in the short term, but drastically improves the usability of the code overall. Depending on:
  1. the programmer's level of expertise (proficiency in the programming language and with objects),
  2. the programmer's understanding of the problem domain (the nature of the specific problem(s) the program is trying to solve),
  3. and the likelihood that reuse of the code will occur either through:
    1. other projects or,
    2. many similar needs in the project at hand or,
    3. changing needs of the project at hand
the choice to make the code more declarative (configurable) might make little sense, or it might be an foundational requirement for the project's success.

Programmers with low levels of expertise might find anticipating the most effective ways to make the code configurable difficult. Writing declarative code can be (but is not necessarily) more time intensive than writing procedural code. Extremely novice programmers should probably not attempt to do this on "new" code, only in the process of refactoring well understood code. Intermediate programmers should look for opportunities to use declarative code preemptively (Prefactoring, ala Kenneth Pugh) but only apply it to a few of the most obvious opportunities in a given project.

It is good to learn by trial and error, but perhaps not prudent when the project's time line is at risk.

The Zend Frameworks includes a very useful Zend_Config class in two varieties: Zend_Congif_Ini (windows ini like) and Zend_Config_Xml. These classes open up files of the appropriate format and convert them into arrays of data that can then be passed in as constructor or method arguments. This example will use only the XML flavor of config file, but either file is a good option depending on personal taste.

The root tag in a config XML file is always used (in a valid XML file there may only be one root tag) and the name of the tag has no effect or side effect whatsoever.  Each tag inside the root tag (there may be no character data as direct children of the root tag, only tags) represents a configuration option. Normally these options are mutually exclusive, but of the "extends" attribute is used in one, and the value of that attribute is the name of another, then the values in the extended option will be inherited by the option extending it. For any values specified by both options, the values of the extending option will override the extended option. For example:

<data>
    <foo>
        <test>1</test>
        <test2>2</test2>
    </foo>
    <bar extends="foo">
        <test>3</test>
    </bar>
</data>

In the above example, the values for foo are test = 1 and test2 = 2. The values for bar are test = 3 and test2 = 2. If bar did not extend foo, then it would only have the value test = 3, and no value for test2.

Tags at the second level are effectively associate keys, the config object turns the config file into a data array. In the above example foo and bar are keys in that array. Tags beyond the second level become arrays within the array:

<data>
    <foo>
        <test>1</test>
        <test2>
            <a>2</a>
            <b>3</b>
        </test2>

    </foo>
</data>

In the above example, the value of test2 is now an associative array with the key "a" = 2 and key "b" = 3.

The use of config files, declarative coding, and related practices address a very critical issue in software development, risk. Procedural code makes assumptions, when these assumptions are correct and durable the procedural code does what is needed. If the assumptions are or at an point in the future become incorrect then changes must be made to the code, and they are often the expensive kind. Using declarative coding effectively requires assessing code risks.

From Use Case Zero (see Refactoring, Use Cases and Abuse Cases) it is clear that this particular program needs to specify two different RDF news feeds. These might typically be accessed by a URL on the Web, but that is not always a safe assumption. Some programs might have access to the RDF file via local file system (which is almost always less resource intensive than via Web URL), or even a local database. One assumption that can be said to be reasonably safe is that URLs can encode information for data stored in a variety of means, and in the more broad sense they are a special case of a URI, which could potentially reference any data.

For the time being, the program will make the assumption that one or more RDFs are accessible somewhere, as specified by the configuration. It will be up to the program to determine how to retrieve each RDF based on the location stored in the config file. The config file simply declares how many RDF sources there are, and gives an identifier. Parsing and validation of the identifier is a procedure left up to the objects that makeup the program.

Another feature that will be needed is sorting of the items in the RDF. Initially, the requirements, and thus the use cases indicated that all RDFs would be sorted with most recent first. In a later iteration of the project it is determined that there are actually two sorting needs:
  1. Most recent first
  2. Most recent last (oldest first)
There are two feeds with different purposes: one with upcoming items and one with recent past items.  Items are dated by when they will happen, not by when they were posted. For upcoming items, the oldest option not already processed is the most relevant option because it is the item that will happen the soonest (in the future), this may be confusing, but it essentially has the "least negative age".

Most items in the past feed will have already been processed when they were in the upcoming feed. There is also sometimes overlap in the two feeds, so the past feed is processed first to give priority to any that might have been missed. For past items the newest is most relevant because it is the one that was missed most recently.

In an ideal world, all missed events would be posted, but since multiple posts might be considered rude or annoying in the case of a Twitter account, the most pressing option is given priority. The order of the feeds matters, so a strategy for aggregating the two (as opposed to sorting each) is also needed. Examples of aggregation strategies might include one after the other, interleaved, random, etc.

One assumption that might be highly volatile in the future is that all news feeds will be RDF. Another assumption is that all items correlate with a URL for the full details. These risks are addressed in the config file, by including a place where each feed can be configured to use a different interpreter object and/or not assume there is a URL associated with each individual news item. This way a bad assumption can be more easily corrected by implementing additional code or altering behavior based on the configuration.

The original web code called itself iteratively using javascript, so at least in some cases a configuration value for auto-running the code iteratively might be prudent. When the program has control over how often it will iterate, allowing the config to specify how often to iterate is another good option. An option for which output method (View pattern) to use is also important since at least two different output needs are known.

The format of the successful return message should be configurable at some point, though this might be accomplished by different output classes rather than declared in the config file. An example of one approach to specifying a template for the message is included below. To handle the case where it might be desirable to distinguish which feed an item came from, each feed configuration is given a "label" value. Currently Twitter only allows 140 characters per post, but that could change. To reduce risk, a control for the length of the message is declared. The format for the date string and the service for URL shortening are also included to make future changes easier.

The posting method is assumed to be Twitter, but if that service ever ceases to exist the option to migrate to an alternative is highly desirable. The current assumption is that subscribers won't want to get more than one message at a time, but since that could also change it is included as a configuration option. Since posting to Twitter requires a login and password, it is included in the config file.

This may raise a red flag with many programmers, and in fact it should! Including logins and passwords in clear text can be risky business. There are many alternatives for storing the login and password information with varying levels of security. If the config file is stored appropriately, the risk of leaving this information in clear text is negligible. Understanding the appropriate security precautions is beyond the scope of this example, but suffice to say it should not be stored anywhere it might be accessible via the web.

Finally, keeping track of what has been posted requires some storage mechanism. Many people prefer a DB, and in many project configurations this is the optimum method. Given the non-concurrent execution nature of this program, a simple file system approach is sufficient.

If concurrent execution (two executions of the same program accessing the same resources) were possible then proper file locking and synchronization methods would be needed. Most databases handle this automatically, some file systems also allow for this with a little programming know how (file systems where PHP's locking mechanism works for example).

Based on the current use cases, concurrent executions should never happen for this program. The configuration simply needs to determine where to store the file, and determine a method and threshold for limiting file size. In the current implementation a limit of 100 past posts is set, but other versions might store posts for a number of days.

A config file for the TwitteRdf program might look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <basic>
        <feedFormat>Rdf</feedFormat>
        <feeds>
            <one>
                <uri>http://sysnews.ncsu.edu/news/index.rdf</uri>
                <label>Unannounced</label>
                <sort>recent</sort>
                <itemUrl>true</itemUrl>
            </one>
            <two>
                <uri>http://sysnews.ncsu.edu/news/events.rdf</uri>
                <label>Upcoming</label>
                <sort>oldest</sort>
                <itemUrl>true</itemUrl>
            </two>
        </feeds>
        <aggregate>sequence</aggregate>
        <output>www</output>
        <autorun>360</autorun>
        <messageConfig>
            <length>135</length>
            <template>[[feedLabel]]: [[title]]([[url]]), [[date]]</template>
            <dateFormat>h:i A l F jS Y</dateFormat>
            <urlService>Snipr</urlService>
        </messageConfig>
        <postService>Twitter</postService>
        <postServiceConfig>
            <login>login</login>
            <password>password</password>
            <maxPosts>1</maxPosts>
        </postServiceConfig>
        <localCacheConfig>
            <filepath>_cache/tweets.pser</filepath>
            <thresholdCriteria>count</thresholdCriteria>
            <threshold>100<threshold>
        </localCacheConfig>
    </basic>
</config>


Lots of declaration going on! In the next post we'll start implementing some of the procedure that will drive the action.

Comments [0]

Quiche Recipie

Not my usual blog post, but some one requested the recipe,  and I said to my self:

"Why send it to one person by e-mail, when I could just blog it?"

Ambient transmission of information is really cool, it is what Web 2.0 (the real Web 2.0, not the buzz-hype) is all about.

Anecdotally, I recently had an issue with the office communal coffee maker. No one had used it in well over a year (since before I started). When we tried to establish it as a new routine there was a lot of enthusiasm, but the results were horrible. I ended up buying my own french press because I just couldn't drink the coffee brewed by the coffee maker, but others continued to use it suffering in silence.

I decided to do something about the situation, so I bought a liter of white distilled vinegar and posted the directions for using it as a cleaning solution on an internal Intranet FAQ application. My intention was mirthful, but as it turns out the information transmitted by osmosis to at least one other department resulting in an improved the quality of coffee, and therefor quality of life not only for my department, but at least one other as well.

Quiche is one of my favorite foods to make, one reason is it is incredibly flexible as is apparent by the recipe below. You need Eggs, and a crust but aside from that most of the other ingredients are pretty much negotiable (though cheese is probably a requirement for most tastes).

So without further adue, a recipe for Quiche:

The lower numbers of this recipe are more appropriate for a thinner pie pan, and the original Quiche used the smaller amounts indicated below. I think the second Quiche turned out better and it used the larger amounts, but required a slightly thicker pan.

You will need:
  • A Pie Crust - You could make this by hand, but I just get Pillsburry rolled up crust. Frozen crusts also work, but I like the results from a rolled crust better.
  • 5-6 Eggs
  • Couple of pinches of salt (or more to taste).
  • 1 Tablespoon of Flour (probably optional, I have made it without flour before)
    • I used King Aurthur Whole Wheat Flour but any all-purpose flour should do.
  • 1 3/4 - 2 cups of shredded cheese.
    • I used a Monterey/Colby blend.
  • 1-2 Sausages (about 4-5 inches long each, about 1 inch diameter).
    • I used Archer Farms (Target Brand) Spinach and Feta Sausage.
  • Handful of fresh spinach (1/2 cup to 1 cup). I used baby spinach.
  • 1 Tablespoon of oil (Vegetable/Olive) or liberal dusting of Pam-like spray.
If using rolled crust: preheat oven to 450. Place crust in glass pie pan, press down, gradually stretching crust so the crust sticks firmly to the glass pan. Cut off excess around the edges. Bake for 5 minutes. Take out to cool.

Set Oven to 350.

Gently press down any bubbles in the pie crust before it cools.

Cut the sausage into thin slices, if you don't want to break them up into crumbles as you sauté, chop them into small pieces.

If the Spinach has long stems (more than an inch), you may want to slice them off  (and include them or discard them).

Coat frying pan with oil (or use spray), sauté the spinach and sausage. Crumble the sausage as it cooks if not already crumbled. Sauté until spinach is very dark green and very limp. Remove from heat and set aside.

In a bowl break the eggs, add salt, scramble.

*For a fluffier result add two table spoons of Milk and beat vigorously (oops, I didn't have milk this time).

Add in flour gradually, mixing well with fork or whisk. (if you use it, I have omitted it before with satisfactory results).

Mix shredded cheese in with egg mixture.

Pour mixture into the crust.

Bake uncovered for 40 minutes. A done Quiche will have a slightly brown top and a toothpick inserted into the center will come back out clean.


Comments [0]

Campus Talk: Zend Coding Standard and PHP Doc

When: TBD in September, I am leaning towards a brown bag lunch format, so noonish.
Where: TBD on main campus.

I am planning a talk for any interested parties on campus about the Zend Framework Coding Standard and the requirements and benefits associated with adoption. As with any coding standard, the goal is to make code more readable and easier to maintain through uniformity. As with Java coding standards, the recommendations Zend puts forth for PHP also better enable automation tools such as PHP Doc and the Zend Framework Class Loader.

These coding standards were developed with the specific intention of making contributions to the Zend Framework more uniform, but any PHP project can benefit from adoption of some or all of the standards independent of adoption of the Zend Framework.

The format will be a brisk 20 minute presentation on the coding standards along with a bird's eye view of the Zend Framework, Zend Studio (code editor), and PHP Doc. Following the presentation I would like to open the floor up for discussion of related topics such as the costs and benefits of:
  • implementing coding standards.
  • varying levels and approaches to code documentation.
  • coding methods supported in PHP such as "scripting" vs. "objects".
The talk will be mostly focused on issues from the PHP perspective, but I would encourage any interested programmer to attend since most of the topics are relevant to different languages.

Comments [0]

Refactoring, Separation of Concerns

Everyone has different interests and strengths. While there are some people that claim to excel at "multi-tasking", and some people are jacks-of-all-trades, most people generally do better when focusing on one task at a time, or very narrow set of related tasks. One of the primary tenants of object-oriented programming is to assign purpose to classes, limiting the scope of any one class. Each object performs its role, avoids meddling in the affairs of other objects, and what generally results is a system that is reasonably easy to understand and predict. This approach exemplifies a major theme of computer science: break down a problem into smaller, more easily solved problems.

People and programs are very different things. Computer programs do not "care" about being interrupted in the same sense that humans do. From personal experience I know that when I am working on an important project, the last thing I want is for some unrelated demand to ambush my productivity and momentum. The process of stopping, shifting mental gears, and then returning to the original task is time consuming and mentally taxing.

For a computer program on the other hand, the difference between being interrupted with B while doing A, and completing A and then doing B is typically a trivial matter of microseconds and memory allocation. One complex "object", a traditional procedural program can do most if not all tasks just as well, if not occasionally more efficiently than a well designed system of objects. The difference again comes from the human factor. A well designed system of objects is often easier to develop, maintain, and expand than a single amorphous blob of code.

The right level at which to introduce objects depends heavily on the project configuration (the set of design choices dictated to or by the project). Projects in Java for example are innately completely object oriented, while projects in other languages may not have the capacity for true objects. PHP is a particularly flexible language with growing support for object-oriented features. The size of the project also may necessitate the use of objects to make the code manageable, or might preclude the usefulness of implementing classes. All but the most rudimentary programs can benefit from leveraging object-oriented techniques, but the expertise to make objects helpful rather than a hindrance is important.

In the previous post, the boot strap code included a file that procedurally outlined the first few steps of the process. Before going much further let's revisit that line and instead of including a file, instantiate an object:

//Now for our reguarly scheduled program.
$controller = new TwitteRdf('TwitteRdf_View_Www','_config/sysnews.xml');

To understand what is happening here, first one must understand the autoloading capabilities of the Zend_Loader class (included in code from the previous blog post).

The Zend Framework includes a wide variety of classes that are useful in various types of PHP development. Among these is the Loader class, which is a helper class (AKA utility class, a collection of static methods). One of the methods of this class inserts a process into PHP's method of locating class declarations such that if a class is instantiated that has not been declared, PHP will search for a file and include it automatically to try and secure a declaration for the class before throwing a fatal error. If the class is found a declared, execution can proceed as normal.

This "autoloading" method allows the programmer to instantiate classes without having to manually include files and packages. So long as the classes are saved in a file structure following the conventions used by the autoloader, it will find them automatically. The Zend Framework includes coding conventions, one of which sets forth the naming convention for classes such that the autoloader can find them.

In the case of the class TwitteRdf, the autoloader will expect to find a TwitteRdf.php file directly in the include path(s). It is important to note that while the name of objects should follow some logical grouping by function, they do not reflect class inheritance. As an example, the constructor for TwitteRdf takes the name of a view class, and a file path to a config file. The class name of the view used for web output in this sample code is "TwitteRdf_View_Www".

When the code later attempts to instantiate the class TwitteRdf_View_Www it will start in the include path(s) and look for a file named "TwitteRdf/View/Www.php". The class defined inside that file isn't neccesarily a sub-class of TwitteRdf, rather as one might infer from the name and usage in the constructor for the TwitteRdf class it is a associated with a "View" class or pattern. Other views for the application might logically be stored in the same View folder in different PHP files.

Given what is known so far about the TwitteRdf class, a shell for it might look something like:

/**
 * File documentation
 * @package TwitteRdf
 * @version 0.0.1
 */
 
 /**
  * Class documentation
  */
class TwitteRdf
{

    /**
     * All non-constant properties should be protected or private...
     */
    protected _property = NULL;
   
    /**
     * The PHP constructor method
     */
    public function __construct($viewName, $configFileName)
    {
   
    }
   
    /**
     *
     */
    private function createView($viewName)
    {
   
    }

    /**
     *
     */
    private function loadConfig($configXml)
    {
   
    }
   
    /**
     *
     */
    private function run()
    {
        $currentRdfUrl = '';
        $pendingRdfUrl = '';
        $message = '';
       
        $currentNews = new TwitteRdf_Rdf_Extractor($currentRdfUrl);
       
        $pendingNews = new TwitteRdf_Rdf_Extractor($pendingRdfUrl);
       
        $newsList = array_merge($currentNews->toArray(), $pendingNews->toArray());
       
        $publishedItems = unserialize(file_get_contents('_cache/TwitteRdfCache.pser'));
       
        $potentialTweetItems = array_diff_ukey($newsList,$publishedItems);
       
        return $message;
    }
}

With this new shell in place, some other classes that are needed to make the code run are TwitteRdf_View_Www and TwitteRdf_Rdf_Extractor. In addition a configuration file must be generated and the cache file has to be initialized. The code to this point should technically run since the constructor does not attempt to call the "run" method, but since it makes no attempt to output the result would be a completely blank page with no output.

In order to test the program in any capacity, some output is needed. The sample code has error reporting turned on. While one might assume that if there were any errors they would be output to the browser, it is not a very safe assumption. To generate output while properly following "separation of concerns", the code will need to employ a view class.

The view classes in this application are an implementation of the previously discussed View Pattern. This is the class that will handle the decision as to what should be output, though the Controller Pattern makes the ultimate decision on when to output and even which View class or object to use, and how to use it. The implementation of the Controller Pattern in the sample code is the TwitteRdf class itself.

To be even more specific, the views used by the application will be an adapter for the Zend_View class. The application will only touch on a fraction of the features of Zend_View, and creating an adapter helps make the code more portable should the Zend Framework not fit well into a later project configuration that might want to use code from TwitteRdf.

The sample view adapter in "TwitteRdf/View/Www.php" looks something like this:

<?php
/**
 *
 */
class TwitteRdf_View_Www{
    /**
     * The view object
     */
    protected $_view = NULL;
   
    /**
     * The path inside of _template (if any) and the file name
     * excluding the .php file type suffix of the file to use
     * for the view script. Should not start with a '/'.
     */
    protected $_template = '';
   
    /**
     * The data set to be available to the template via $this->data
     */
    protected $_dataArray = array();
   
    /**
     * Switch used to ensure output only happens once.
     * When this property is true, the render method will
     * output nothing.
     */
    protected $_mute = false;
   
    /**
     * Characters not allowed in the template name.
     */
    protected static $_notAllowed = array(
        '.','\\','+','*','?','[',']','(',')','^','$'
    );
   
    /**
     * Constructor takes an array by reference and an optional
     * template name. If no template name is specified 'www' is
     * used.
     */
    public function __construct(&$data, $template='www'){
        $this->_view = new Zend_View();
        $this->_view->setScriptPath('.');
        $this->_dataArray =& $data;
        // don't allow '.' and other funny characters in $template
        $template = str_replace(self::_notAllowed,'',$template);
        $this->_template = '_template/'.$template.'.php';
    }
   
    /**
     * Ouputs the view to the browser unless it has already been
     * called once. Returns a boolean indicating if output occured.
     */
    public function render(){
        $outputOccured = false;
        if(!$this->_mute){
            $this->_view->data = $this->_dataArray;
            print($this->_view->render($this->_template));
            $outputOccured = true;
        }
        $this->_mute = true;
        return $outputOccured;
    }
}


To use this object there must be an _template directory in place and for this example a www.php file is needed.

<html>
<head>
<title><?php print($this->data['title']); ?></title>
<?php if($this->data['reload']){
    if(is_array($this->data['reloadTime'])){
        $this->data['reloadTime'] = max($this->data['reloadTime']);
    }
?>

<script language='javascript'>
setTimeout( "window.location.reload()", <?php print(max(1,$this->data['reloadTime'])); ?>*60*1000 );
</script>
<?php } ?>

</head>
<body>
<?php print($this->data['message']); ?>
</body>
</html>


The html portions will look familiar if you have looked at the original example. The Zend_View class merges data, presumably from some Model Pattern with a template script of some sort. The adapter above simply alters the behavior of Zend_View by abstracting it with an adapter better tailored to the needs of this application and extended slightly to restrict where the template script may come from.

Now in the TwitteRdf controller's constructor the new adapter can be called and assigned some data:

    public function __construct($viewName,$configXml){
        $dataArray = array(
            'title'=>'Need A Title',
            'reload'=>false,
            'reloadTime'=>60,
            'message'=>'<p>So Far So Good...</p>'
        );
        $this->_viewAdapter = new TwitteRdf_View_Www($dataArray);
        $this->_viewAdapter->render();
    }

While far from finished, the program is starting to take shape. It should be pretty clear by now that one downside to this method of development is it will result in behavioral and declarative code separated out across multiple files. In the next post I'll introduce the Zend_Config class and explore the difference between procedural programming and declarative programming.

Comments [0]

Refactoring, Use Cases and Abuse Cases

Sometimes code just happens. The example code is one such case. There were no formal requirements, I simply wanted to be able to receive Sysnews posts via text message, and Twitter was a low barrier means to achieve that goal. Once the proof of concept was realized however, the next logical step is to explore options for moving the code into production.

This is a step where all too often the right precautions are not taken. Working code is not often, good code. Moving this particular service into production would have a number of "risks" that need to be addressed to ensure that other services are not degraded, and that adopters of the service to not encounter dissatisfying experiences.

The previous blog post listed a number of needs for the code. These are a form of requirements. Collecting requirements in some format is important for disciplined software development. It helps identify risks, forms the basis for success measuring criteria, and creates a foundation for communication. There are a wide variety of formats for gathering software requirements. As with many other agile methods, the process of codifying requirements is something that can be customized to the developer, organization, or project's needs.

One format of requirement that can be useful through the development life cycle is the Use Case. Use cases are descriptions of what the program should do in different situations. Depending on project needs these might be simple sentences outlining the primary functions, or they might be detailed descriptions of every feature. It is important to note that use cases are supposed to capture what the system should do, not how it should do it, unless that "how" is actually a functional requirement.

As one might imagine, the number of requirements (use cases) a system has will affect the complexity of the system and ultimately the development time. Every recorded requirement should be a genuine need, and should be distilled carefully so that the software meets the needs without wasting development time on unimportant features.

Use cases are often presented in a concise list, though sometimes more narrative prose is employed. Some use cases may specify prerequisites and post conditions. Applying Use Cases by Geri Schneider and Jason Winters is an excellent source for ideas in making the most of use cases. Another good guide to using use cases effectively is Patterns for Effective Use Cases.

Development teams may use specifications like UML to create visual diagrams of the processes described by use cases. Some tools can even help generate shell code from the data used to create such diagrams, but such tools can be costly and the usefulness of the results varies greatly depending on the project and team. UML diagrams are often not the way to go for small projects and small teams since the maintenance of the UML and the learning required to properly create and maintain them is cost prohibitive. Even in large projects the benefits of UML should be carefully weighed.

More often, simpler visualizations of the processes described in Use Cases may help illustrate concepts not easily captured in words. The key is to maximize communication effectiveness and reduce the overhead of creating and maintaining the uses cases. When visuals can do this, it makes sense to use them. Use cases and other forms of requirements are non-functional byproducts of the software development process. Working code is the focus, and use cases only help make this happen.

Use cases may use nested lists to breakdown special conditions, or represent branching conditions. It is important to keep use cases simple and easy to read, so nesting should be kept to a minimum. Instead use cases can reference or link to to other use cases to create the required connections.

As the name might imply, use cases are user focused. Each begins with a user making a request to a system (as mentioned before users and systems may be people or programs, both are agents). The scope of each use case needs to be kept clear, When the user makes a request to the system and there is a clear handoff where the system itself becomes the user of some other system, a second use case is needed. It is also important to keep the concept of users distinct from their role. In some situations the came agent might be performing two different roles. It is almost always important to capture these distinct roles, but it may not be important to note that the same agent performs both.

Verb phrases are the preferred label for use cases as it is the action or service being performed that is central to the unit. Each use case needs to clearly identify what the successful criteria are.

The root use case for the example program would look something like this:

Use Case 0
  • Trigger - script call
  • Preconditions - none
  • System requests RDF news feed of current news items [Use Case 1].
  • System requests RDF news feed of pending news items [Use Case 1].
  • System merges the two feeds, excludes already tweeted items, and selects one unpublished item based on date for the item[Use Case 2].
  • Pull the twitter status to make sure that the item has not already been twittered [Use Case 3].
  • If pulling the twitter status fails, generate an error message to that extend [Use Case 4].
  • If there is nothing unpublished, return a message saying so.
  • Get a short URL for the unpublished item from Snipr [Use Case 5].
  • If getting a Snipr URL isn't possible, generate an error message to that extend [Use Case 4].
  • Truncate the title as needed such that the Snipr URL and title together total 130 characters or less. [Use Case 6]
  • Attempt to tweet the unpublished item. [Use Case 7]
  • If Tweeting fails, generate an error message to that extend [Use Case 4].
  • It Tweeting succeeds remember that item as having been published and return a message listing what was tweeted, and any remaining unpublished items.
  • Postconditions- some message has been returned.
Some teams might prefer an even simpler Use Case format with fewer or less specific tasks. This is about as detailed as a use case should get. Some teams might even argue that information like "130 characters of less" belongs in a formal requirements document and not in the Use Case itself. This is of course, the point. Teams using agile methods need to have the ability to tailor the application of use cases to a process that works well for them.

In my own work I typically use very informal requirements documents with a Purpose section, Vision section, optional Background,  optional Risks, and the rest are Use Cases. When there are functional requirements that are specify inappropriate for a Use Case format, such as specific validation algorithms that need to be codified, I append each special functional requirement. Other possible appendices as needed include some suggestions for requirements documents from Software Requirement Patterns (Withall 2007):  Context, Scope, Major Assumptions, Major Exclusions, Key Business Entities, Infrastructures, Major Nonfunctional Capabilities.

The use case outlined above, Use Case 0, references seven other use cases which may need to be fleshed out or may even need to reference additional use cases. Use cases 1 and 2 are interrelated, 2 works on the products of 1. Here is where the discussion of patterns and refactoring  from previous posts comes into play.

In Use Case 0, some agent initiates a script that, among many other things, needs to request and parse RDF documents to extract a list of news items. This alone is a good indicator that the process of handling RDF would benefit from having its own agent. By setting up Use Case 1 as a request to another agent, the main script no longer has to implement the details of RDF requests and parsing, hence a separation of concerns. This creates one less thing that the main script needs to implement. As an added benefit, there is a need to request and parse two different RDF files. If one agent can be designed to accept a request for any RDF file, parse it, and return the information in a more useful format then the case for making Use Case 1 its own agent is solidified.

RDF is a large and complex file format. Fortunately, we only need to implement a few things to get the script at hand working. Agile methods emphasize developing only what is needed. That is not to say that the scope of implementation and planning be short sighted. Various agile methods prescribe different levels of planning ahead for code, but most encourage developers to err on the side of avoiding analysis and design paralysis. A good design for an agent (in this case an object) to handle RDF is one that will allow us to handle only what we need without making it hard to add functionality later.

Use Case 1
  • Trigger - Use Case 0
  • Preconditions - A URL or file location to request the RDF from, or an RDF string.
  • Get the RDF file from the specified location if needed.
    • If it is a URL [Use Case 1.1]
    • If it is a file location [Use Case 1.2]
  • Extract a list of news items from the RDF XML.
    • For each item, get the URL link, Title, Creator, and Date.
  • Index the list by URL link, order by Date with most recent first.
  • Postconditions- The list of desired information exists, even if it of size zero.
For the purpose of this exercise, I consider 1.1 and 1.2 to be pretty trivial. They are easy to program in PHP, but might be more involved in other languages. This is a case where the configuration of the project may need to determine the level of granularity.

Use Case 2
  • Trigger - Use Case 0
  • Preconditions - One or more lists of news items, indexed by a unique and permanent URL with details on title, creator, and date. Another list of news items by URL which have been published.
  • Generate a new list by excluding any already published items and any duplicates.
  • Order the new list by date.
    • The items with a date in the past go first from earliest (most recent) to latest.
    • If any item happens to match the current date and time, it goes next,
    • followed by items in the future ordered from soonest to farthest in the future.
  • Postconditions- The list of desired information exists, even if it of size zero.
Handling Use Case 3 and Use Case 4 will require some tinkering with Twitter, so the uses cases covered so far will be a good starting place for refactoring the code.

Starting from a fresh file, a wrapper index.php is a good way to create a script that can easily be adapted to a command line script. I also use index.php files as a wrapper to get around a number of quirks with trying to debug code in some environments. PHP is one of the best languages I have ever used for debugging because it is so flexible. In a wrapper index.php I typically start with code like:

$debugMode = true;

if($debugMode){
error_reporting(E_ALL);
    ini_set('display_errors', 'On');
}

This creates an easy flood gate to turn on and off full error and warning messages. If your server environment does not allow using ini_set, which many won't, you may be able to find other ways to adjust PHP's error messages.

In the case of more involved scripts, I would typically do some additional boot strapping with a simple array of paths which could be configured differently for different hosts. I often use a localhost LAMP/WAMP/MAMP for developing and upload to a sandbox or production server.  The array makes it easy to set some basic configurations that work on each possible host and they can be swapped automatically based on hostname, or manually with a variable much the way debug is turned on and off above. Once the paths for the host have been mapped I typically create a Zend_Loader to autoload object in the Zend Framework and in my own framework extensions and then load all other script configuring data through Zend_Config_Xml.

// Paths needed before install and/or before Zend_Config is available
// if your host is not included below, add a key in the $installHosts
// array that matches the host name of your web server (all lower case),
// and populate it with keys/values for any paths that need to be
// altered. You may delete entries for host names you won't use.
$installHosts = include_once('_config/installHosts.php');

$pathToZend = getHostSetting('pathToZend',$installHosts);
// Determine if Zend needs to be added to the path
$addZendPath = (isset($pathToZend)
    && $pathToZend != './'
    && $pathToZend != '.'
    && trim($pathToZend) != '');
$pathToZendExtentions = getHostSetting('pathToZendExtentions',$installHosts);

// Attempt to add Zend and/or Zend extentions to the path.
// On some server configs this will not be allowed. When that
// is the case, comment out these lines and have your server admin
// add the appropriate path(s).
ini_set('include_path', ini_get('include_path')   
    . ($addZendPath ? "{$pathToZend}:" : '')
    . "{$pathToZendExtentions}:");   

// Now that Zend is available, start the auto-loader.
require_once('Zend/Loader.php');
Zend_Loader::registerAutoload();

A typical installHosts.php might look something like this:

/*
 * This file is the ONLY FILE in the config folder that should be a
 * .php array declaration. All others should be Zend_Config_Xml files.
 */

return(
    array(
        'default' => array(
            'controller' => '/',
            'documentRoot' => $_SERVER['DOCUMENT_ROOT'],
            'configFileLocation' => "_config/{$_SERVER['HTTP_HOST']}.xml",
            'defaultConfigFileLocation' => "_config/default.xml",
            'pathToZend' => './', // assume it is already in path
            'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/extension/",
            'jqueryPath' => '/include/jquery/jquery.js'
        ),
        'production.lib.ncsu.edu' => array(
            'controller' => '/journey/0.5.0/',
            'pathToZend' => '/opt/coolstack/php5/lib/zend/library/',
            'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}/journey/0.5.0/extension/"
        ),
        'localhost'=>array(
            'controller' => '/journey/0.5.0/',
            'pathToZendExtentions' => "{$_SERVER['DOCUMENT_ROOT']}journey/0.5.0/extension/"
        )  
    )
);

/**
 * Looks for a setting specific to the hostname to return,
 * if it does not exist return the default.
 * This is a quick and dirty option for anything that occurs before
 * Zend_Autoloader allows us to use Zend_Config
 */
function getHostSetting($name, &$hostArray){
    return(
        (array_key_exists($_SERVER['HTTP_HOST'],$hostArray) && array_key_exists($name,$hostArray[$_SERVER['HTTP_HOST']]))
        ? $hostArray[$_SERVER['HTTP_HOST']][$name]
        : $hostArray['default'][$name]
    );
}

The configuration of this particular setup is just an example. The important thing is that programs be written so they are portable, and boot strappers like the above example help reduce the need for mixing configuration concerns in the code.

Back in the index.php file, we'll need to call the actual script:

//Now for our reguarly scheduled program.
print(include('twitter.php'));

Obviously, twitter.php doesn't exist yet. A Shell for it based on Use Case 0 would look something like this:

$currentRdfUrl = 'http://sysnews.ncsu.edu/news/index.rdf';
$pendingRdfUrl = 'http://sysnews.ncsu.edu/news/events.rdf';
$message = '';

// System requests RDF news feed of current news items [Use Case 1].

$currentNews = new TwitteRdf_Rdf_Extracter($currentRdfUrl);

// System requests RDF news feed of pending news items [Use Case 1].

$pendingNews = new
TwitteRdf_Rdf_Extracter($pendingRdfUrl);

// System merges the two feeds, excludes already tweeted items, and selects one unpublished item based on date for the item[Use Case 2].

$newsList = array_merge($currentNews->toArray(), $pendingNews->toArray());

$publishedItems = unserialize(file_get_contents('_cache/TwitteRCache.pser'));

$potentialTweetItems = array_diff_ukey($newsList,$publishedItems);

// Pull the twitter status to make sure that the item has not already been twittered [Use Case 3].
// If pulling the twitter status fails, generate an error message to that extend [Use Case 4].
// If there is nothing unpublished, return a message saying so.
// Get a short URL for the unpublished item from Snipr [Use Case 5].
// If getting a Snipr URL isn't possible, generate an error message to that extend [Use Case 4].
// Truncate the title as needed such that the Snipr URL and title together total 130 characters or less. [Use Case 6]
// Attempt to tweet the unpublished item. [Use Case 7]
// If Tweeting fails, generate an error message to that extend [Use Case 4].
// It Tweeting succeeds remember that item as having been published and return a message listing what was tweeted, and any remaining unpublished items.

// Postconditions- some message has been returned.
return $message;

Clearly the above code won't work. We have not declared the class TwitteRdf_Rdf_Extracter yet. Hopefully it is clear now that the refactored code will be much easier to read than the old code.

One final topic for this post: Abuse Cases. Most development teams focus the use case collection process on desired functionality. One thing I like to do is ask people, "What's the worst thing you can imagine going wrong with the system". These are abuse cases, when one agent does something undesirable that causes major problems for the other agents. As has been already mentioned, one major reason for refactoring this code was because recently, Twitter went through a period of a few hours where it returned a 500 server error, but still posted messages. If a program like this encountered such a problem, it might easily continue to post the same item over and over again.

Catching all abuse cases is hard if not impossible, but it is very important to ask the people you collect use cases from both, "What should the system do?" and "What should the system not do?". One of the most important agile principles is communication, and involvement of all stakeholders. Capturing the right abuse cases can significantly reduce the risks in any project.


Comments [0]

Refactoring, Patterns and Other Important Considerations

One of the biggest advances in software engineering didn't originate from a programmer, but from a architect. Christopher Alexander didn't design code, he designed buildings. None the less his ideas on effective design are the underpinning of modern software development as popularized by the "gang of four" publication: Design Patterns: Elements of Reusable Object-oriented software. While an exploration of the patterns introduced in that book is beyond the scope of this article, the important thing to know is that patterns enhance communication and learning actives in the software development process.

One major challenge in communication is shared understanding. A great deal of confusion can occur when two individuals have different mental models of what they otherwise believe to be the same thing. The two primary strategies in object-oriented software development for avoiding this issue are:
  1. Interface- a carefully and explicitly negotiated contract for interaction. Interfaces create mutually understood means for ensuring predicable behavior.
  2. Abstraction- the practice of obscuring details that are not essential to simplify communication. This is often called separation of concerns because abstraction removes the need for everyone to know everything.
Patterns are a way to codify a relatively small set of the most commonly needed solutions in a discipline in a way that identifies opportunities for application, outlines successful strategies, and focuses on learning by example and application. Patterns fit well into agile methods because they enable individuals with low expertise to benefit from a wide body of professional knowledge. As individuals gain confidence and skill they can continue to apply what they have learned from patterns  creatively. At more advanced levels individuals can introduce new patterns and update or enhance existing patterns with the knowledge and experience they have gained. Through this spectrum of expertise, patterns help individuals at different levels communicate with each other.

One core pattern many software applications use is the Model-View-Controller (MVC) triplet. In this pattern the idea is to separate three major concerns: how data is may be manipulated, how data is presented, and how the user may interact with the data (respectively). Another way to think of the components is to consider the model to be a gate keeper for business logic, the view to be a designer that presents information, and the controller that takes input from the user and drives interaction with the system.

There are a variety of ways to implement MVC, some more disciplined and others more light weight. In the example code there is virtually no user interaction at all. The only point of interaction for the "user" is the decision to run the code and the response returned by the code. These are both still considered user interaction points regardless of whether the code is run by a human user, or called by an automated process such as a cron job.

The code does interact with other systems, three to be exact. It makes a request via HTTP to a web server from which it acquires an RDF summary of NC State University's "Sysnews". This RDF document is a summary of news articles published through a web site. After it has processed the RDF it selects one article and makes an HTTP request to a web service called Snipr to acquire a short URL with which it can publish the selected article. The third request sends a short 140 character or less string of text to a web service called Twitter. Twitter then performs a wide variety of tasks, including publishing the short message to a web page and RSS feed, as well as sending out instant messenger and text message notifications to any subscribed users. In all three of these cases the example code is itself a "user" for other programs. It initiates a request and receives some information in return.

The term "user" can mean a variety of things in different software development circles. Throughout this blog it is used to mean an "agent", human or otherwise, that initiates some interaction. Generally speaking "system" will also be used to mean an "agent" that responds to some request for interaction. The true value of the MVC pattern is that it leads to systems that can easily be adapted to use by any form of agent. Careful application of MVC can also lead to programs that can be users for "human systems".

Some developers have a very specific view of MVC, they either love it because they have had positive experiences with MVC frameworks such as Ruby on Rails, or they dislike it because they have had negative experiences trying to use MVC frameworks to do things the framework was not designed to do. It is important to distinguish between MVC as a software pattern, and MVC frameworks. Both are useful in certain situations, but they have different applications despite the fact that they are based on the same concept. MVC frameworks are a particular way to do MVC, by necessity they have a set of features that can be both empowering and limiting. MVC frameworks are for the most part a system, they do actual work. In contrast the MVC pattern is more generally applicable, but since it is not a system it does not accomplish any real work. The MVC pattern is a concept, not a program.

In learning to use patterns effectively it is important to set aside technology prejudices and carefully consider the underlying ideas that comprise the patterns. Just because one MVC framework does not do what is needed, or does not do it effectively does not mean that MVC is the wrong pattern to apply. In the example code it might seem that MVC is not very applicable because there is a low level of interaction. The truth of the matter is most reusable code must consider the principles embodied by the MVC pattern, because at the root MVC is about separation of concerns virtually all programs have: manipulation of data, presentation of the data, and interaction with the user.

Another pair of patterns that are commonly used are the Singleton and Utility patterns. Both rely on static class definitions, which is a feature of object oriented programming where a class (type or object) has properties and methods that exist separately of any object instantiation. Static properties and methods are also shared by all instantiations of that class. What both patterns have in common is that they have a private constructor, thus an object of that class cannot be created in the same way as most other classes.

In most languages objects are normally created with a "new" operator. Instead, a singleton class can only be instantiated by calling a static method and that method returns a new object the first time it is called, but each subsequent time it is called it will always return that same object. In some languages there may be other operators that must be altered for a singleton object so that the one instantiation (the only instantiation) can never be copied.

The utility pattern on the other hand is used for creating a class that may never be instantiated. This is most commonly used to create stateless objects that provide a library of commonly used functions without the need to use functions (which are methods not associated with any class).

The functional difference in these two classes is that a singleton may have state values associated with it and thus effectively ensures there is at most one instantiation of that class at any given time. The utility pattern is intended to be stateless, so it in effect never has any instantiation.

There are a variety of other patterns for controlling the creation of objects including several variations on the Factory and Builder patterns. These won't be needed in the example code, are more complex, and the implementation of these patterns is highly dependent on the language being used. The theme to keep in mind is that some patterns are focused on control over the creation and lifespan of objects in a program.

Other classifications of patterns include: structural, behavioral, and concurrency. The previously mentioned MVC patterns are commonly considered to be an "architectural" classification.

Structural patterns that will be used in refactoring the example code include the Adapter pattern and the Decorator pattern. Behavioral patterns that will be used in refactoring are the Memento pattern and the Interpreter pattern.

Looking at the example program and the description of what it does, several different concerns come to mind. The program needs to:
  • know how to read an RDF file
  • be able to pick out the information it needs from the RDF file
  • able to get the RDF file from the web server
  • decide which news item to "Twitter"
  • remember which news item(s) it has already "Twittered" so that it never twitters the same item twice
  • report back what it did (success, fail,  and if there are more items to twitter).
  • be able to acquire a short URL for the article from Snipr
  • be able to "Twitter"
Some additional concerns include things the program does not do, but could or should:
  • verify previous successes and failures independently of the statuses reported back. This is needed because Twitter suffers occasional problems with false negatives and it is important to not post the same thing twice as this may "annoy" people subscribed to IM and text message notifications. Similarly, if it is repeatedly discovered that a false positive has occurred, it would be prudent to retry since these messages are sometimes of a critical nature.
  • detect a failure in Snipr. This has not occurred, but some simple error checks could prevent future problems.
  • the program should behave differently if called directly from a web browser or called by running PHP on the command line, primarily it should not spew HTML out to the terminal but instead use a simpler plain text response.
  • allow some basic configuration of the script for greater portability and reuse, including the ability to handle multiple RDF sources with some basic rules for establishing priority and frequency of Twittering
  • put into place abstractions needed to allow later versions to use sources other than RDF (such as RSS/Atom)
  • only Twitter at certain times of the day
  • notify someone via Twitter and/or e-mail when something is wrong
  • some basic management of the program should be achievable, primarily the ability to manually tell it that a news item failed and to retry or manually insert a news item.
Each of these concerns indicates the opportunity to apply a pattern. The practice of doing that will be explored in the next post: Separating Concerns.

Some other considerations that need to be made with respect to this program are:
  • Validation- there are a number of places where input checks will be used to ensure reliability and safety.
  • Declarative Code- by separating concerns the code will have objects that do more declarative work, and less procedural work.
  • Removing duplications of code.
  • Use Cases
Use Cases are one tool often employed by agile programmers of various flavors to capture requirements in the form of how the program should behave in response to different "agents". Many practitioners of agile limit "use cases" to human triggered interaction and while these are of particular interest, I feel that is too limited a view of any system. In my own particular model of capturing requirements, all points of interaction (users and systems as mentioned above) involve agents and so all desirable behaviors can be captured in use cases. I also like to document "Abuse Cases" that is foreseeable agent actions (intentional, unintentional, human, or otherwise) that could have a significantly disastrous effect on the behavior of the program.

Comments [0]

Refactoring, Introduction

Lolcats (or dogs in this case) make everything better. Refactoring is a method for improving software by applying various techniques. Adding a lolcat is easy, refactoring requires practice and experience. As one might reasonably imagine, it is hard to write good code unless one knows how to code well. Refactoring integrates into the learning experience, it helps the programmer learn as they go. Writing good code also requires proper motivation. Some of the best programmers start projects with bad code, but there is a reason.

Counter to traditional thinking, it is often not prudent to spend too much time planning a program up-front. There is a high level of risk involved in not understanding the requirements for a project, but there is also a high level of risk in not being prepared to adjust to changing requirements. In real world scenarios time is often equated with a cost, and this creates a need to balance time invested towards progress and time conserved for adapting to change.

The idea of refactoring is simple, assume everything can be made perfect the first time. Make it "good enough" knowing that there is time set aside later for making it "better". Better can mean a wide variety of things from cleaner code, to optimized performance, and even adapting to changing or better understood requirements. Refactoring is a well established and accepted part of agile software development, so much so that there are variations on the practice.

One variation proposed by Pugh in Prefactoring is to apply many of the same practices as refactoring to the initial planning process. The trade off is higher planning time, but potentially less need to refactor. An important fact to keep in mind is that prefatoring still relies heavily on expertise to yield successful results.

Refactoring is especially effective on development teams. As Cockburn proposes in Agile Software Development there is a wide spectrum of expertise among individuals, and the level of expertise a person has affects their perception of the software development process. Careful awareness of these differences can create great opportunities for personal growth and project success at all three levels: personal, organizational, and technical.

Refactoring can also help the solo developer, but small and one man projects are situations where Pugh's idea of Prefactoring may be more effective. In the software industry, large teams are not uncommon, but in the academic IT environment small teams are the norm. It can be challenging to apply agile software development  methods to academic IT projects. Most agile information resources are written for, and based on the experience of larger project teams. Fortunately just as early adopters of agile saw that the need for projects to remain flexible was critical to success, those that followed saw that a one-size-fits-all solutions lack flexibility. Many recent publications and approaches have tackled the issue of "project configuration", giving developers guidance in selecting which methods will work best for their team.

Through the next few articles, this un-refactored code written in PHP will be used as an example for various refactoring practices. It is used to pull information from an RDF news feed and publish it to an online communication tool called Twitter.

Comments [0]

Refactoring: An Example

A recent project that I am rather proud of (mainly because it _is_ so simple) is the Sysnews Twitter bot. This simple program (under 300 lines of code) pulls in the Sysnews RDF and pushes it out through a Twitter account using two APIs: Twitter and Snipr. It is something I have been wanting to (and have gotten permission to) cron at work, but first I want to take some time to make the code really solid. Since is it so small I figured that it would be a great project.

Step one of the refactor is going to involve analyzing where the potential for improvement is. Technicaly speaking it is only 300 lines of code, but that is for one instance, and currently there are two running: one for current and one for future announcements. The code for each instance is hacked slightly so that they work in tandem.

Step two is to try and follow Agile practices as much as possible to build the new version (to an extent from scratch), though the algorithms of the original code will probably be used when/if appropriate.

Step three will be to post the results and fire up the new and improved croned version.

My goal is to come up with a new program that is more robust (the current script had to be taken down due to a temporary bug in Twitter), configurable, administratable remotely, cronable (i.e. part of it runs from command line), and be built of reusable code while remaining under 600 lines of code (which is both current instances combined). The reason I'm basicly doubling the allowance is because I don't want to have to calculate non-statement lines (i.e. lines that have only one or two non-white space character on them to follow coding standards ane required comment lines for class/method declarations).

Comments [0]

Layer 8

In 2004 a white paper titled, "Layer 8" circulated the NC State University campus IT groups generating a wide range of discussions. In the introductory section the paper labeled IT as a "tool that can either aid or hinder our progress" and proposed that it could be better leveraged by "sens[ing] the emerging realignment [of IT] and understand[ing] the direction in which IT is moving".

The proposal was well thought out: Recalibrating the process of integrating, purchasing, and developing IT to organizational goals. The approach was flawed in a number of respects. It attributed change to technology, not people. Key points in the rhetoric were later found to be tied to the interests of the individuals involved and not in any real requirements set forth by the community.

The views in this paper failed to account for risk as a natural part of adapting change. Instead risk and failure were attributed to poor alignment and poor accountability. A clear fear of risk and failure is expressed and "pilot projects" suggested as the remedy. Sources sited "blame" people for failure, propose higher levels of ceremony in processes. Change is viewed almost as a nemesis: if we don't get the jump on change, it will get us.

Layer 8 starts out with a nice premise: the human factor is the deciding factor in successful application of technology. If Layer 8 were true to this idea, the result would have been more in line with Agile thinking.

Agile methods were designed in highly competitive, risk driven environments: the corporate software development world. Considering this, it might seem that Agile methods are ill equipped to apply to academic IT environment. Nothing could be farther from the truth. All dynamic environments involve competition and this is particularly true in the academic IT environment. Agile methods are also iterative and the academic environment has built in iteration cycles, the semester.

Taking technology out of the picture for a moment, because Layer 8 is clearly written to push the agenda of decentralized computing which is a technology concern, the biggest takeaway message is: we need to communicate about what our organizational goals are and make technology decisions based on those. Meeting organizational goals is important, but there is another layer of success that is important to consider. Personal success, that is the collective success of individuals, is often the key stone to organizational success. Blaming individuals opting out of technology change is a coping mechanism for groups unable to implement technology in a fault tolerant manner. The right technology incorporates powerful rhetoric that either ensures success primarily through conversion and through administrative backing as a fall back.

Conversion means people willing adopt technology because it solves the problem as well if not better, a measure of personal success. Administrative backing means that the technology is aligned with organizational goals such that there is peer and supervisory pressure to embrace technology change. In this way administrative backing is a measure of organizational success.

Left, balanced traditional vin diagram of three overlapping values. Right, spiral vin diagram.There is a third form of success in Agile methods, but it is dependent on the to the other two: technical success. Technical success can only be achieved when personal and organizational success are also realized. James Shore and Shane Warden illustrate the overlap of these three forms of success as a traditional triquetra-esque vin diagram. In contrast, a more proper view of the relationship between these three forms is more spiral and in nature. That is to say that there are many ways we can succeed personally, independent of any level of organizational success or technical success. There may be a few ways we can succeed organizationally and not personally, but there are most successful organizational outcomes depend on personal success with or without technical success. Similarly there are even fewer ways to technically succeed that do not have some overlap with either personal success, organizational success, or both. The sweet spot is dead in the center of  area where all three forms of success overlap.

Layer 8 would have been a better proposal if it had actually considered the human layer and not glossed over into the institutional and enterprise layer.

Comments [0]

Agility

One of the great technology developments of the 1990s wasn't a hardware or software advance at all, but rather a change in the way programmers were doing their jobs. The move from "waterfall" methods to more iterative methods sparked a major shift in how software was produced. These changes affected not only speed, but also quality across another of different criteria including perfomance, reliability, appropriateness, and satisfaction. The older methods are still used today, ranging from fairly unorganized project management tovery maticulously configured waterfal or semi-iterative methods.

Along with the notion of iterative development came a lot of other new ways of developing software which better fit the emerging methodology. Scrum, XP (Extreme Programming), Crystal, and a number of other methods provided practices well suited to the short cycle, contiunous delivery processes required for truely iterative development. Lumped under the umbrella of "Agile" methods, this new way of developing software had one key difference from the traditional thinking behind software development.

Uniformly, agile software development embraces the notion that people come first, not technology. People come before process, process comes before technology. This principle is key to understanding why agile methods are different and how they work well when applied properly. The right motivation must be in place, otherwise the underlying methodology won't work effectively.

In exploring Fabulous IT, the human factor takes center stage. Newer approaches like the Enterprise Unified Process take agile beyond just software development to encompass all of IT. Technology itself isn't fabulous. People are fabulous: solutions, design, packaging, and communication are all human aspects associated with technology that can be fabulous. Fabulous and agile go hand in hand, but I hope to demonstate that agile is a subset of fabulous.

Comments [0]

Kept saying I was going to do it.

So I did it. Not much time to do any thing else but at least I made the blog. 

Comments [0]