WebKit

Webware for Python 0.2

Version

WebKit 0.2

Table of Contents


Synopsis
Feedback
Introduction
      Overview
      Compared to CGI "apps"
Running and Testing
      Getting Started
      WebKit Adaptors
Errors / Uncaught Exceptions
Configuration
Administration
Class Hierarchy
Debugging
      print
      Raising Exceptions
      Restarting the Server
      Assertions
Naming Conventions
Files
How do I develop an app?
Release Notes
Limitations/Future
      To Do
Credit

Synopsis

WebKit provides Python classes for generating dynamic content from a web-based, server-side application. It is a significantly more powerful alternative to CGI scripts for application-oriented development.

Feedback

Webware is open source and feedback in all areas (concept, design, code, documentation, packaging, etc.) is appreciated. Please e-mail Chuck Esterbrook at echuck@mindspring.com with any questions/comments or even an offer to help.

Introduction

Overview

The core concepts of the WebKit are the Application, Servlet, Request, Response and Transaction, for which there are one or more Python classes.

The application resides on the server-side and manages incoming requests in order to deliver them to servlets which then produce responses that get sent back to the client. A transaction is a simple container object that holds references to all of these objects and is accessible to all of them.

Content is normally served in HTML or XML format over an HTTP connection. However, applications can provide other forms of content and the framework is designed to allow new classes for supporting protocols other than HTTP.

In order to connect the web server and the application server, there is a small CGI script, WebKit.cgi, that bundles a web browser request and sends it to the application server, which then processes it and sends the response back to WebKit.cgi which then outputs the results for use by the web server. See the Running and Testing section below.

At a more detailed level, the process looks like this:

  1. At some point, someone has configured and run both a web server (such as Apache) and the WebKit app server (WebKit/AppServer.py).
  2. A user requests a web page by typing a URL or submitting a form.
  3. The user's browser sends the request to the remote web server.
  4. The web server detects a CGI script, WebKit.cgi, in the URL and invokes it.
  5. WebKit.cgi simply collects information about the request and sends it to the WebKit app server which is ready and waiting.
  6. The app server asks the Application object to dispatch the raw request.
  7. The application instantiates an HTTPRequest object and asks the appropriate Servlet (as determined by examining the URL) to process it.
  8. The servlet generates content into a given HTTPResponse object, whose content is then sent back by the app server to WebKit.cgi.
  9. WebKit.cgi prints the content which the web server then delivers to the user's web browser.

Compared to CGI "apps"

The alternative to a server-side application is a set of CGI scripts. However, a CGI script must always be launched from scratch and many common tasks will be performed repeatedly for each request. For example, loading libraries, opening database connections, reading configuration files, etc.

With the server-side application, the majority of these tasks can be done once at launch time and important results can be easily cached. This the application significantly more efficient.

Of course, CGIs can still be appropriate for "one shot" deals or simple applications. Webware includes a CGI Wrapper if you'd like to encapsulate your CGI scripts with robust error handling, e-mail notifications, etc.

Running and Testing

Getting Started

You will need to locate the Webware distribution in a directory such that your web server will accept URLs there, including CGIs. For testing purposes, a URL might look something like:

http://localhost/~echuck/Projects/Webware/WebKit/WebKit.cgi/Welcome

If you want to create a symbolic link from your HTTP server area (such as public_html/), be sure to link to the WebKit directory and not the CGI (or FastCGI) adaptor within.

Before you enter the URL into a browser, you need to start the WebKit app server from the command line:

> cd ~/Projects/Webware/WebKit
> AppServer.py
WebKit AppServer 0.2
part of Webware for Python
Copyright 1999-2000 by Chuck Esterbrook. All Rights Reserved.
WebKit and Webware are open source.
Please visit: http://webware.sourceforge.net

<a display of all settings and their values>

Listening at ('', 8086)
OK

Upon entering the URL, the results of the Welcome page should be displayed in your browser. Some diagnostic information will be printed in the terminal window of the server. You'll notice after the first page is served, that additional pages are served much faster. You can verify this in the console of the AppServer which reports the response time for each request. As the application warms up (e.g., caches classes, objects, files and settings) performance increases.

The Welcome page has links to other examples which you can browse. There is also a link with each example for viewing the source.

WebKit Adaptors

A WebKit adaptor takes an HTTP request from a webserver and, as quickly as possible, packs it up and ships it to the AppServer which subsequently sends the response back to the adaptor for delivery to the web server and ultimately the web client. More concisely, an adaptor is the go-between of the web server and the app server.

There are currently two adaptors in WebKit. The first is the CGI adaptor, named WebKit.cgi, which should work with any CGI-enabled web server. Of course, the web server will have to be configured to allow execution of CGI scripts.

The second adaptor is the FastCGI adaptor, named WebKit.fcgi, which should result in faster responses, although definitive performance tests have not yet been made on the adaptors. Your web server will have to be FastCGI enabled, which you may be able to accomplish by picking up software at http://www.FastCGI.com where you can also learn more about FastCGI. There is also very useful documentation provided inline with the adaptor which you should read if you choose to use it.

In the future, custom adaptors for popular web servers such as Apache, Microsoft, etc. should be provided in order to provide even better performance.

Errors / Uncaught Exceptions

One of the conveniences provided by WebKit is the handling of uncaught exceptions. The response to an uncaught exception is:

  1. Log the time, error, script name and traceback to AppServer's console.
  2. Display a web page containing an apologetic message to the user.
  3. Save a technical web page with debugging information so that developers can look at it after-the-fact. These HTML-based error messages are stored one-per-file, if the SaveErrorMessages setting is true (the default). They are stored in the directory named by the ErrorMessagesDir (defaults to 'ErrorMsgs').
  4. Add an entry to the error log, found by default in Logs/Errors.csv.
  5. E-mail the error message if the EmailErrors setting is true, using the settings ErrorEmailServer and ErrorEmailHeaders. You'll need to configure these to active this feature.

Here is a sample error page.

Archived error messages can be browsed through the administration page.

Error handling behavior can be configured as described in Configuration.

Configuration

There are several configuration parameters through which you can alter how CGI Wrapper behaves. They are described below, including their default values. Note that you can override the defaults by place config files in the Configs/ directory. A config file simply contains a Python dictionary containing the items you wish to override. For example:

{
      'ServletsDir': MyApp,
      'ShowDebugInfoOnErrors': 1
}


Configs/AppServer.config:

PrintConfigAtStartUp
    = 1
Does pretty much what it says. It's generally a good idea to leave this on.
Verbose
    = 1
If true, then additional messages are printed while the AppServer runs, most notably information about each request such as size and response time.
Port
    = 8086
The port that the application server runs on. Change this if there is a conflict with another application on your server.
Multitasking
    = threading
Determines if and how the application server will multitask requests. Must be one of: threading, forking, sequencing.


Configs/Application.config:

PrintConfigAtStartUp
    = 1
Does pretty much what it says. It's generally a good idea to leave this on.
ServletsDir
    = 'Examples'
This is where the application always looks for the servlets. This location does not appear in URLs; it's implicit. The path can be relative to the WebKit location, or an absolute path.
LogActivity
    = 1
If true, then the execution of each servlet is logged with useful information such as time, duration and whether or not an error occurred.
ActivityLogFilename
    = 'Logs/Activity.csv'
This is the name of the file that servlet executions are logged to. This value makes no difference if LogActivity is false/0. The path can be relative to the WebKit location, or an absolute path.
ActivityLogColumns
    -->
['request.remoteAddress', 'request.method', 'request.uri', 'response.size', 'servlet.name', 'request.timeStamp', 'transaction.duration', 'transaction.errorOccurred']

Specifies the columns that will be stored in the activity log. Each column can refer to an object from the set [application, transaction, request, response, servlet, session] and then refer to its attributes using "dot notation". The attributes can be methods or instance attributes and can be qualified arbitrarily deep.
ShowDebugInfoOnErrors
    = 1
If true, then uncaught exceptions will not only display a message for the user, but debugging information for the developer as well. This includes the traceback, HTTP headers, form fields, environment and process ids.
UserErrorMessage
    -->
'The site is having technical difficulties with this page. An error has been logged, and the problem will be fixed as soon as possible. Sorry!'

This is the error message that is displayed to the user when an uncaught exception escapes the target CGI script.
ErrorLogFilename
    = 'Logs/Errors.csv'
This is the name of the file where exceptions are logged. Each entry contains the date & time, filename, pathname, exception name & data, and the HTML error message filename (assuming there is one).
SaveErrorMessages
    = 1
If true, then errors (e.g., uncaught exceptions) will produce an HTML file with both the user message and debugging information. Developers/administrators can view these files after the fact, to see the details of what went wrong.
ErrorMessagesDir
    = 'ErrorMsgs'
This is the name of the directory where HTML error messages get stored.
EmailErrors
    = 0
If true, error messages are e-mail out according to the ErrorEmailServer and ErrorEmailHeaders settings. This setting defaults to false because the other settings need to be configured first.
ErrorEmailServer
    = 'mail.-.com'
The SMTP server to use for sending e-mail error messages.
ErrorEmailHeaders
    -->
{
    'From':         '-@-.com',
    'To':           ['-@-.com'],
    'Reply-to':     '-@-.com',
    'Content-type': 'text/html',
    'Subject':      'Error'
}

The e-mail MIME headers used for e-mailing error messages. Be sure to configure 'From', 'To' and 'Reply-to' before using this feature.

Administration

WebKit has a built-in administration page that you can access by specifying _admin in the URL. This gives basic information and provides links to other administrative features like viewing the logs and configuration.

The error log display also contains links to the archived error messages so that you can browse them.

The adminstration scripts provide further examples of writing pages with WebKit, so you may wish to examine their source.

Here's an example of the admin page.

Class Hierarchy

     KeyValueAccess
        Object
            Configurable
               Application
               WebKitAppServer
            Servlet
               HTTPServlet
                  Page
            Transaction
            Message
               Request
                  HTTPRequest
               Response
                  HTTPResponse
            Cookie
            Session
            ServletFactory
               PythonServletFactory

Debugging

As will all software development, you will need to debug your web application. The most popular techniques are detailed below.

print

The most common technique is the infamous print statement. The results of print statements go to the console where the WebKit application server was started (not to the HTML page as would happen with CGI). Prefixing the debugging output with a special tag (such as >>) is useful because it stands out on the console and you can search for the tag in source code to remove the print statements after they are no longer useful. For example:

print '>> fields =', self._request.fields()

Raising Exceptions

Uncaught expections are trapped at the application level where a useful error page is saved with information such as the traceback, environment, fields, etc. You can configure the application to automatically e-mail you this information. Here is an example error page.

When an application isn't behaving correctly, raising an exception can be useful because of the additional information that comes with it. Exceptions can be coupled with messages, thereby turning them into more powerful versions of the print statement. For example:

raise Exception, 'self = %s' % self

Restarting the Server

Servlet super classes don't get reloaded by the AppServer. Also, Webware can still be unstable due to it's early age. If a problem becomes unsolvable, restart the server and see if it still occurs on the first attempt.

Assertions

Assertions are used to ensure that the internal conditions of the application are as expected. An assertion is equivalent to an if statement coupled with an exception. For example:

assert shoppingCart.total()>=0.0, 'shopping cart total is %0.2f' % shoppingCart.total()

Naming Conventions

Cookies and form values that are named with a preceding underscore are reserved by WebKit for it's own internal purposes. If you refrain from using preceding underscores in your names, then this [a] you won't accidentally clobber an already existing internal name and [b] when new names are introduced by future versions of WebKit, they won't break your application.

Files

*.py - Mostly the implementation of the various classes of the framework, except as noted below.

WebKit.cgi - A simple script that imports and runs WebKitCGIAdaptor. This also results in a cached byte code version of WebKitCGIAdapator (e.g., a .pyc) (provided you have write permission, of course).

CGIAdaptor.py - A program that collects HTTP requests together in a dictionary and passes them to the app server for processing. This allows WebKit applications to work with any web server that supports CGI.

FCGIAdaptor.py - Like CGIAdaptor, but in accordance with the FastCGI protocol for improved performance.

address.text - The hostname and port of the AppServer are written to this file when it starts up. The CGI and FastCGI adaptors read this in order to communicate with the AppServer.

Config/AppServer.config - configuration file for the AppServer

Config/Application.config - configuration file for the Application

Logs/Access.csv - The log of servlets/pages accessed through the server as described above.

Logs/Errors.csv - The log of uncaught exceptions including date & time, exception and link to debugging page if one was saved.

ErrorMsgs/Error-scriptname-YYYY-MM-DD-*.py - Archived error messages.

How do I develop an app?

The answer to that question might not seem clear after being deluged with all the details. Here's a summary:

  1. Make sure you can run the WebKit AppServer. See the Running and Testing section.
  2. Read the source to the examples (in WebKit/Examples), then modify one of them to get your toes wet.
  3. Create your own new example from scratch. Ninety-nine percent of the time you will be subclassing the Page class.
  4. Familiarize yourself with the class docs in order to take advantage of classes like Page, HTTPRequest, HTTPResponse and Session. Unfortunately, I couldn't get generated class docs working for this release, so you'll have to resort to breezing through the source code which is coupled with documentation strings. Read the examples first.
  5. With this additional knowledge, create more sophisticated pages.
  6. Contribute enhancements and bug fixes back to the project.   :-)
  7. Make sure you find out about new versions when they're released: Subscribe to the announcements list at http://lists.sourceforge.net/mailman/listinfo/webware-announce.

ReleaseNotes

Limitations/Future

Sprinkled throughout the code are comments tagged with @@ which are hopefully accompanied by a date and someone's initials. These comments represent "TO BE DONE"s. The double at-sign (@@) convention was chosen because it doesn't appear to be used for anything else.

In addition to the inline comments, some significant items have been recorded below. These are future ideas, with no committments or timelines as to when/if they'll be realized. The Python WebKit is open source, so feel free to jump in!

The numbers below are mostly for reference, although longer term items tend to be pushed towards the bottom.

To Do

  1. Hunt down: @@ tags (which signify "To Be Done"s), FUTURE items in class doc strings, NotImplementedErrors, -- tags
  2. Right now, if the Application itself (as opposed to Servlets) throws an exception, it doesn't get captured nicely. However, it is displayed in the app server's console.
  3. Support generated links to static pages (like HTML and GIFs). This has been started but not completed.
  4. Concurrency doesn't work yet. In other words, Application is not thread-safe in it's dispatching of requests.
  5. When browsing through the Servlet examples, the [BACK] button of the browser (Netscape 4.7 in my case) doesn't seem to keep a history. This is pretty strange.
  6. Fix log/CSV problems involving quotes and commas.
  7. Application modifies sys.path so that servlets can say "from SuperServlet import SuperServlet" where SuperServlet is located in the same directory as the Servlet. We'd prefer a more sophisticated technique which does not modify sys.path and does not affect other servlets.
  8. Consider this statement from the FastCGI docs: Redirects are handled similar to CGI. Location headers with values that begin with "/" are treated as internal-redirects; otherwise, they are treated as external redirects (302).
  9. If a URL does not contain an extension, the Application should figure out what it is. This would allow developers to change pages (say from .py to .psp) without affecting URLs and therefore bookmarks.
  10. Have Transaction.session() handle the obtaining or creating of a session on demand for better efficiency? (per Jay Love)
  11. Consider how Pages might be nested. Consider if a different name is in order.
  12. Consider changing the '_' convention for admin scripts.
  13. Docs: Create a caching section to discuss the virtues of doing so. Color example became 12 X faster on the server side.
  14. Consider if we need to support <form action="x.py?a=1" method="POST"> where you will have both a query string and posted data.
  15. The exception handler is pretty nice and has features like logging, e-mail, gathering debugging info, etc. However, on occasions it can throw exceptions too. There should be a simpler, secondary exception handler for when this happens.
  16. Actions in forms should be able to invoke a particular method of the Servlet. yes/no?
  17. Create a tutorial.
  18. Review the timestamp caching logic and its relation to .pyc files if any.
  19. AppServer: Servlets get reloaded when they are changed, but not the Servlets ancestor classes.
  20. What are the implications of multi-threading for servlets?
  21. Implement all of AppServer's settings.
  22. Create a memory leak test script that hits the AppServer repeatedly.
  23. Consider whether or not a "plug-in" architecture would make sense for transaction processing.
  24. AppServer: Pickling the request is probably faster and more secure than evaluating it.
  25. Role-based security and user-authentication. Goal is to eliminate, as much as possible, developer-written security logic. This should be provided by the WebKit and be configurable.
  26. Write a custom adaptor for Apache, Netscape, MS, etc.
  27. Distribution and load balancing.
  28. More sophisticated admin tools including password protection, clearing logs, displaying a maximum of information at a time, etc.
  29. Check out
  30. Multi-language support/localization (e.g., vend data to clients in their preferred written language)
  31. Consider CORBA standard RMI-IIOP and it's potential interaction with WebKit. This technology has been marked for inclusion in J2EE. I imagine the idea might be that an app server could be used by more than just web browsers. e.g., it could be used programmatically (in a more natural way than simulating a web client).

Credit

Author: Chuck Esterbrook

Several people have provided feedback and testing, but the most significant contribution in this area has been by Jay Love.

The design was inspired by both Apple's WebObjects and Java's Servlets.

The Python SocketServer module was useful in creating the app server.