Archive-name: clarinet/howitworks ClariNet draws news from a variety of sources. This news is processed and converted into USENET format at ClariNet facilities. It is then sent out via UUCP (the telephone/modem based inter-unix communications facility) and TCP/IP (the computer communications protocol used by many machines, including those on leased line networks like the internet) to ClariNet customers around the world. We receive UPI (United Press International) wireservice news directly via satellite, in the same way that newspapers receive it. The wire news comes (more or less) in what is known as the ANPA (American Newspaper Publishers Association) format. This format was designed some time ago. In the beginning, all wires simply fed directly to printers or teletypes, at speeds of 300 bps or less. The ANPA format was eventually designed and revised to help newspapers that fed the wire directly into the composing computer. Even so, it is primitive compared to formats like the USENET news format and modern electronic mail formats. Only a small amount of information is formally specified. By and large, the information is intended for use by computer assisted humans, not an electronic newspaper system like ClariNet. The satellite feed also provides us with syndicated columns, stocks, and other newspaper related services. The syndicates all buy transmission time on the two main newswire satellite networks (UPI and AP) -- charging it back to their customers, of course. For other sources, we either call pickup points by modem or have the sources upload the information to us. Once again, our software converts the information and injects it into the USENET style news system. Where possible, news is fed directly to customers with minimal human intervention. Our software has been trained to deal with the various inconsistencies in the wire feed so that news goes out even outside of business hours. This ensures that the news gets to you as quickly as possible. The software takes category information provided by the reporters and uses it to classify the articles into one or more appropriate newsgroups. For example, all NASA stories go to clari.tw.space. During business hours (and often outside them, too) ClariNet editors scan the report. We can delete bad stories, edit them to make corrections, or adjust categorizations and newsgroups. If a story is corrected, the old version is canceled and the update re-issued. We don't edit every single mistake we find. In general, we edit serious errors and add or delete categorizations from stories. Most of this news is written quickly, with the goal of getting it to the client as soon as possible. As such we sometimes let typos and other minor mistakes stand, in order to avoid excessive re-issuance of stories. "Wireservices" Long before USENET existed, the wireservices built the first large scale text broadcast systems. Aside from the feeds to newspapers -- done at first by telegraph, later by leased lines and now by satellite -- the wires have their own internal nets as well, where they can issue messages to their own people and even engage in limited discussion. These nets have been around since the 19th century, long before computers even existed. Unfortunately, it seems at times that their technology hasn't changed much since then. As you will read, the reporters key in all the headers and classifications by hand with cryptic single letter codes. This is very prone to error. With luck, this system will be replaced in the near future. The largest wireservice in the world is the Associated Press, or AP. AP is owned by member newspapers. It has its own reporters, but also draws stories from the member papers. In the USA, the #2 wire is United Press International, or UPI. UPI is an independent wire, privately owned. UPI draws revenue only from fees charged to client newspapers and distributors like ClariNet. The third major wire is Reuters. Reuters now makes the vast bulk of its revenue not from newspapers, but by providing information to people in the finance industries. Nonetheless its wireservice components in the USA are similar in size to UPI. As the #2 wire, UPI is far more willing to experiment with new concepts like electronic publishing. This is what makes ClariNet wireservice news possible. Just like USENET, wireservices have their own vocabulary. You'll see some of it in the advisories on ClariNet stories, which we put in the Note: header line. "Wire Activity" All wire stories have the following main components: 1. A priority that marks the importance of the story. 2. A general category from one of about a dozen ANPA defined codes. 3. A *slugword*, or unique keyword that identifies the story for that day. A variety of other fields are optional and described later. "Priorities" UPI covers a wide variety of topics. The most important stories are termed *breaking* news. These stories are assigned one of three special priorities -- flash, bulletin and urgent. *Flash* is the most extreme priority there is. Flash stories are only one sentence long, and are followed almost immediately by a bulletin. The last known flashes were "space shuttle explodes" and "U.S. invades Iraq" -- this gives you some idea of the importance of these stories. Any flash, if and when it comes, will be posted to clari.news.flash. If you're a system administrator, you might arrange for special treatment and forwarding of such stories. *Bulletin* is the normal priority for the most important breaking stories of the week. Bulletins can range from major government announcements up to big events such as the U.S. invasion of Panama. One normally doesn't see more than a few bulletins per week; although like world events, bulletins come at random. *Urgent* is a priority assigned reasonably frequently -- 3-6 times per day. The most important stories of the day get this priority. Most other news gets the *regular* (called *rush* in the wire industry) priority. Some other news will see lower priorities. These are listed in the description of the Priority header line. All breaking news stories are posted to special groups dedicated to news of that priority. When a story is first assigned a priority, we maintain it in the group for that priority each time it is re-issued, even if the wire has dropped the story's priority to a lower value. "Scheduled News" A lot of the major news that "moves" on the wires is not unexpected. For example, a presidential press conference is sure to produce a big story, and everybody knows what time that story will arrive -- they just don't (usually) know what it will say. In addition, a number of stories are important, but not particularly urgent, and are written with care for release at a particular time. This is true of features and analysis pieces, or pieces about developing world situations. These types of stories are known as scheduled stories, or "skedded" in the wire lingo. The editors release a schedule of upcoming big stories for newspaper editors to use in planning their pages. We assign any "skedded" story a priority of *major*, and have created some special groups, called "top" news groups, for such stories. "Classification" The ANPA category provides some useful information about a dozen ANPA categories used regularly. To supplement this, UPI has reporters and editors classify stories with special custom codes. These map to keywords identifying several hundred different story topics. It is these codes, along with our own judgement, that classify most of the stories into newsgroups. "Story Updates" When a newspaper goes to press, it wants the latest version of any developing story. For this reason, almost all breaking stories get issued several times during the day. The reporter keeps the text in his or her laptop, edits it as new details, quotes and corrections develop, and re-issues the entire story whenever anything important happens. On a big story, as many as 20 updates may come in a day. Most major stories see two or three. All updates (should) come with the same *Slugword* -- the unique keyword that identifies the story. When ClariNet sees a story come in with the same slugword as a previous story, we normally arrange to replace the old story with the new one. This is done by canceling the old one (USENET cancel message) and issuing the new one. Unfortunately, it's not as simple as that, and this feature of wireservices is the source of the greatest problem in interfacing a wire to USENET format news. Often updates come only minutes apart. In these cases, the cancel and update is done before the original article is batched and sent to our clients. This means that you never even see that original, which is good. If updates are more widely spaced, you will get both versions (or several versions) and the cancel message(s). This means your newsgroups -- particularly the groups for breaking news -- will be full of gaps formed by deleted articles. This causes the original rn program to pause, and can cause worse problems for the nn newsreader. This can be fixed, however. The worst question is how to present the updates to the reader. This system works well for newspapers, for which it was designed. They are only issued once a day, so readers only get the story that was current at press time. On ClariNet, however, if you read an article soon after its release, and then come back to read again a few hours later, you may well see the same article presented again. You aren't seeing the same article, of course, you're seeing an update. It is up to you to decide if you wish to read the update for the latest details, or skip it. Fortunately most updates have a Note: line indicating what has changed in the article -- but only since the last update. If several updates have been sent out since you last read news, this may not tell you enough. It is a dilemma. Either we present the subscriber with redundant news that most readers will elect to skip, or we keep potentially important updates from eager readers. We have decided to do the former. The use of Newsclip, and eventually fancier reading tools, can deal with this problem in a more suitable fashion. "Other Duplicates" The update system isn't perfect, because the input from the wire isn't perfect. Reporters sometimes forget to put updating flags on stories, for example. Our software is keyed to look for changes in the headline or byline on a story. A changed headline more than a few hours after the original story is treated as a new story by us. This works about 95% of the time. Sometimes, however, you will see a duplicated story appear under two headlines. We try to correct these by hand. Another common source of duplicates is changed slugwords. Sometimes an update comes to correct a mistyped or incorrect slugword. As no information is provided as to what the old slugword was, we can't arrange to cancel the story being updated. A duplicate ensues. The final major source of apparent duplicates comes from the old concept of a wireservice being split into multiple wires. One hears talk of the "news wire," the "sports wire" and the "financial wire." In the old days, each wire went to a different department in the newspaper. Today it's all the same physical channel, processed by a computer. If a story breaks that belongs in more than one category, it may be sent out twice, with two entirely different slugwords, and two different ANPA category codes. For example, Pete Rose's expulsion from baseball was both a sports story and a general news story. "Standing Stories" The wires put out a large variety of standing stories. These are regular features, all with the same slugword, that appear at some particular interval, such as every day or every week. A list of most of the major standing stories can be found in a subsequent file. "Wireservice Errors" As noted, the wireservice coding schemes are particularly prone to error. We have trained our software to catch many typical errors, but the wires have little in the way of formal specification for what they do put out, and they don't always follow what formal rules they do have. Thus you can expect some errors to reach you, particularly after business hours, or in the lower importance groups which don't receive full time scrutiny. At first, we at ClariNet found these errors quite annoying. One realizes, however, that with thousands of stories to put out, even the best staff will make a few errors each day. By and large, they do not interfere in any significant way with your effort to find the news you want to read, and as such, they can simply be ignored. The most annoying are the coding errors, particularly those from coding typos. You will sometimes see a story in a group that has nothing to do with the topic of that group. For example, a college football story, which a reporter would code as sfc (Sports-Football-College) may get entered as bfc (Business-manuFacturing-Computers) and thus posted to our very popular computer group. Until we can convince UPI reporters to adopt a new coding scheme, such things are unfortunately possible. "Local/Regional Stories" A great deal of a wireservice's output is regional news, collected for newspaper clients in various U.S. states. Now, ClariNet releases many of these stories in the clari.local hierarchy. We have local hierarchies for 30 different U.S. and Canadian regions, in addition to our international and national news. Local stories of national importance are cross-posted between local and national newsgroups. In certain national groups, we do publish regional stories. For example, the computer group, as well as most of the other technical groups, contain regional stories. While this sometimes results in the odd truly-local computer story, ("Computer demo day at local University") most of the time it is worth it. Our editors delete stories of the "demo day" form after-the-fact. "Broadcast News" ClariNet also buys some wireservice news meant for radio stations. These are used to provide our hourly news summaries (clari.news.cast and clari.news.headlines) along with the various local news summaries in the clari.local hierarchy. Radio station wires contain shorter stories, and the stories have no headlines. They are generally a bit sloppier, as the reporters do not expect them to see print. In addition, they contain phonetic spellings of unusual names, so that radio announcers will read things correctly. "Canadian Broadcast news" To serve Canadian clients, as well as expatriate Canadians around the world, ClariNet also offers Canadian news. UPI, as a U.S. wire, offers very little coverage of Canada. This is normal for U.S. media. The group clari.news.canada contains the limited coverage that comes along the main wire -- only truly major stories and financial news. The clari.canada hierarchy provides a feed of a broadcast wire (Standard Broadcast Wire) for Canadians to which we have arranged access. All the problems of radio wires described above apply. The best group to read for those outside of Canada is probably clari.canada.briefs which provides regularly updated summaries of major Canadian stories. The group clari.canada.newscast provides an hourly newscast on world and Canadian news outside of business hours. This also covers U.S. and world events, so non-Canadian readers may wish to read it for late night updates. Canadian regional summaries (still from SBW) appear in the clari.local hierarchy. "Newsbytes" Newsbytes articles are not as well classified as UPI articles, but there is still some useful information. It is put on the Keywords: line. The most important keyword that appears on each line takes the form Bureau-xxx where "xxx" is a three letter code for the location of the bureau. You can use the presence of these codes to track or filter stories from certain regions. For example, filtering out Bureau-AUS will eliminate Australian stories. (International stories that are more likely to be of regional interest are also likely to be coded with country prefix in the subject line, so you can use that in a filter as well.) Other keywords include things like exclusive, review and correction, but it is less likely that you would filter on these. Newsbytes headlines arrive at ClariNet in upper case. Our software converts them to a more readable mixed case. Naturally such software can't be perfect, so the odd error will occur, but this is surprisingly rare. Newsbytes also tags important stories. These are crossposted to the clari.nb.top newsgroup. "Features" Feature articles (such as the Dave Barry) column come in a fashion similar to UPI material, but they will have no keywords or location coding. This is not normally a problem, as you usually will read every item in a feature group. "Street Price Report" The Street Price Report -- a database of the advertised prices for direct buyer computer equipment and software -- comes out once every 2 weeks. In the group clari.streetprice you'll see the vendor database and price quotes for thousands of products. These articles are set with a 16 day expire time so that they will stay around until the next issue. The SPR contains multiple prices for each item from multiple vendors. You pick the vendor whose price and terms you like best. Please note that the SPR only includes what we get from the magazines. With 10,000 quotes in each issue, it is not possible to phone and verify them all. As such you should always check prices before buying. The SPR is copyright by Consumer's Database Inc., so please do not distribute it to others. If your associates need copies, have them contact us.