Sep 15

XML Still Kicking After All These Years

For all the talk about XML, a variation of SGML that can be used for Web publishing, there has been little action. But Sports Illustrated has used the document language to manage a high-profile project under unforgiving time pressure, and then turned the system into a tool for everyday operations.

The weekly sports publication teamed up with cable channel TBS to put news photos of the Special Olympics online using XML. TBS sponsors the international athletic competition, which concluded earlier this month, and it wanted to market the games by making photographs of the event available free to any media outlet. Because the project wouldn’t make any money, it had to be cheap and automated, using the Web as the delivery medium.

TBS solicited proposals from sources inside and outside of Time Inc., the parent company of both TBS and Sports Illustrated. Sports Illustrated won the bid, beating major competition such as AllSport Photography (USA) Inc.

“The money wasn’t even a huge thing. It’s part of being a good corporate citizen,” said Phil Jache, Sports Illustrated deputy picture editor. The magazine was covering the games anyway, so it simply sent more photographers.

What they used

IT projects usually entail laborious preparation such as developing teams, selecting tools, and creating data structures and application architectures. But Sports Illustrated had no time to waste: It had only 10 days to put the Goodwill Games photos online, including assembling the servers and development machines. Its resources? Four Pentium II systems, UserLand Software’s Frontier Web development tool, existing T1 telecommunications links, and a part-time programmer named Jason Levine who was a medical student by day.

si“I had a lot of confidence in [Jason],” Jache said. XML was new territory for Sports Illustrated, but Jache recognized that if push came to shove, “We could always drop back to straight HTML, placing images in static pages.”

The magazine assembled four 400-MHz Pentium II machines for the project, having discovered that even with such high-end parts as Seagate hard drives, Intel logic boards and Adaptec SCSI boards, the systems cost less than they would from a dealer.

“There’s something slightly therapeutic about knowing everything in the machines. [But it’s] not recommended for the technically faint of heart,” Levine said.

Two Pentium IIs acted as Web servers. One handled searching and delivery of low-resolution preview images, while the other managed the corresponding high-resolution images. The third system was called the “inserter.” It was connected to the Sports Illustrated corporate network and updated the Web servers via the Internet. A backup in case either of the Web servers failed, the fourth machine resided at Levine’s home, where he worked remotely. The two Web servers and the backup system each had T1 lines. The inserter system used the corporate network’s T3 connection. All machines ran Microsoft Windows NT 4.0, Microsoft’s Internet Information Server (IIS) and UserLand’s Frontier, which managed the images and fulfilled Web visitors’ requests.

Sports Illustrated’s solution was inexpensive compared to the alternative — an Oracle database and tools to interface with the Web. According to Jache, Sports Illustrated’s license for Frontier cost under $1,000. The only other costs for the project were for server hardware and copies of Windows NT 4.0 Server and Microsoft IIS.

How it worked

Sports Illustrated was already using products from Software Construction Co. of Atlanta to manage its photographs and captions. SCC’s software takes JPEG-compressed photos and appends thumbnail previews and information such as photo captions and credits. The file that contains the data is called an SCC JPEG, but the main photo can be viewed by any program that supports JPEG. The magazine wanted to feed its photos to the Goodwill Games site from its SCC system, which is why it used XML.

“XML is just a data structure or format that anything can use — an application, a person, a Web server, whatever — to pass a bit of information from point A to point B,” Levine said.

Frontier’s ability to pass XML-formatted messages is key, because it lets the servers work cooperatively while placing high- and low-resolution images on the correct servers and delivering the appropriate information to users.

For the games, Sports Illustrated photographers shot an average of 160 rolls of film a day. About 750 images were put on the Web site. All images were scanned into Apple Power Macintosh 8600 workstations using Scitex EverSmart Pro scanners and prepared in SCC software. Images chosen for the Web site were moved into a drop folder on the inserter system.

Frontier watched the drop folder and tagged each new image with a sequential identifying number, which became that image’s file name. Frontier parsed the SCC JPEG file and put the contents into an XML message, then sent XML messages to the Web servers. These messages contained the SCC JPEG contents and image numbers, as well as commands to process them. Copies of Frontier on the Web servers would then store photo captions and credits in separate records, identified with the image number, in Frontier’s own object database. Thumbnails, previews and high-resolution images were put in directories on the appropriate server.

Visitors to the Goodwill Games site could request a search through Microsoft IIS, which would pass the request to Frontier. Frontier would examine its database and match the request to an appropriate image based on the image’s number.

“When somebody requests an image, Frontier dynamically generates a page that has all the text associated with that image, and it generates the URL to whatever image you’re looking at,” Levine said.

The online page would show thumbnail previews, and the user could download the high-resolution image. These high-resolution files, which would be 10 to 12 Mbytes in TIFF format, are less than 1 Mbyte in SCC’s JPEG format and are suitable for print reproduction.

Some shortcomings

Although the project proceeded quickly, it wasn’t perfect. First, the copies of IIS on the Web servers assumed that anything with a JPEG file extension should always be displayed on a browser screen, even if it was a high-resolution image being downloaded.

“That was a relatively annoying problem because we had this high-res image that we didn’t want to accidentally display,” Levine said. The only solution was to change the file extension.

Still, the project was an overall success. The site received 3,000 to 4,000 page views a day during the games, with 130 people, probably representing press organizations, registering to download high-resolution images.

Happy ending

Although the Goodwill Games are over, the Web site continues to provide images of the games to anyone who wants them, and the magazine plans to take what it learned about XML from this project and apply it to another use: Publications around the world license content from Sports Illustrated, but they use FTP to download those images.

That requires downloading a compressed file with multiple photos instead of individually compressed images. With a proper password system, the magazines will be able to pick and choose more easily.

Other uses may follow.

No comments yet, be the first.

Leave a Reply