Excerpt from DRAFT POSIX Std. 1003.25

Disclaimer: This section from the Draft POSIX Standard is provided for reference only. This is the current revision of the draft as of September 20, 2001. This is not guaranteed to be the latest revision.

If you have any comments or questions, please send e-mail to lkessler@users.sourceforge.net.

Annex B (informative): Rationale and Notes. 2

B.20 Event Logging. 2

B.20.1 Introduction. 2

B.20.2 Logging of Kernel Events. 2

B.20.3 Event Log Structure; Persistence of Records. 2

B.20.4 Remote Logs; Portable Logs. 3

B.20.5 Integrity of Event Data. 3

B.20.6 Log Maintenance. 5

B.20.7 Performance. 5

B.20.8 Why Not Just syslog?. 6

B.20.9 Data Definitions. 6

B.20.10 Data Formats. 7

B.20.11 Log-Entry Object 7

B.20.12 Write to the Log. 9

B.20.13 Open an Event Log for Read Access. 9

B.20.14 Read from an Event Log. 9

B.20.15 Notify Process of Availability of System Log Data. 9

B.20.16 Reposition the Read Pointer 11

B.20.17 Compare Event Record Severities. 11

B.20.18 Queries = Event Filters. 11

B.20.19 String Equivalents of Event Attributes. 14

B.20.20 Standard Event Types. 15

B.20.21 Revision History. 15

Annex B (informative): Rationale and Notes

B.20 Event Logging

B.20.1 Introduction

The standard calls for a single system-wide log, to which all event records are written. Funneling all event records intact through a single logical stream makes it easier for an implementation to monitor and analyze events in a system-wide context, in order to determine where faults may exist. This capability is critical to the consensus model of using the event stream as a conduit for achieving fault-tolerance within the system.

The standard provides for logging of raw binary data, to enable efficient construction and processing of log event records. With purely textual data, hardware sense data and other binary failure data cannot be adequately supported, and analysis options are limited. The standard also supports the logging of textual data as a special case, and so in this respect is compatible with the functionality commonly found in syslog implementations.

B.20.2 Logging of Kernel Events

The POSIX standard specifies an application’s interface to the operating system. Therefore, this event logging standard does not attempt to specify an API for logging events generated by the operating system kernel. There was strong consensus, however, that the event logging system should accommodate events generated by the kernel, that kernel events should be logged to the system log along with application-generated events, and that the format of kernel-generated events should conform to this event logging standard.

B.20.3 Event Log Structure; Persistence of Records

The structure and organization of event logs is left to the implementer's discretion, subject only to the constraints of the specified interfaces. In particular, although the standard requires that the posix_log_read() function yield a posix_log_entry structure that contains the event record’s attributes, there is no requirement that the attributes be stored in the log in that form.

The standard establishes an open/read/close style of interface for reading of event records, in order to support processing of archival log files and log files from other systems. This same interface is used to read the system log.

Applications that read the system log should be prepared for new events to be appended to the log at any time. Other than that, it was felt that, at least between maintenance activities (see Section 20.5), such applications should have a stable view of the event log: once read, event records should not disappear from the application’s view of the log.

B.20.3.1 Order of Events in Log

The standard requires that new events be added at the end of the system log. This implies that a sequential read of the log will yield the records in chronological order according to when they were written to the log. However, a record’s timestamp is assigned at the time of the call to posix_log_write(). (It was felt that the timestamp should reflect the time of the event as nearly as possible.) The working group recognized that, for a variety of reasons, it may be difficult for some implementations to guarantee that events are written to the log in exactly in the order in which they were initiated – especially considering that kernel events may circumvent posix_log_write() entirely on their way to the log. Therefore, there is no requirement that events appear in the log in order of timestamp.

The implementation is free to support other methods of accessing event logs, so long as the aforementioned sequential read is supported according to the standard.

B.20.4 Remote Logs; Portable Logs

The standard does not provide for direct access to logs on remote systems, nor does it specify how event information is transferred from one system to another. The working group expects that implementers will provide extensions to the standard to meet this need. The format of an event record is binary rather than textual, so portability (if any) of an event log between different architectures is left to the implementation(s).

The working group briefly considered the idea of defining an architecture-independent event-log format. This would enable a conforming program running on one system to display or otherwise interrogate an event log written on another system – even one with a completely different architecture. However, several issues became immediately apparent. For example,

The contents of an event record’s optional-data portion are, by definition, opaque. Encoding and decoding this data requires information that is outside the scope of this standard.
Similarly, certain fields of the posix_log_entry struct (e.g., log_facility, log_processor) also have opaque types.
Certain name-to-number mappings – such as user IDs, group IDs, and possibly facility IDs – typically differ from system to system.

The working group concluded that portability of event logs is beyond the scope of the current standard.

B.20.5 Integrity of Event Data

There was much discussion of possible guarantees that should be required of the implementation regarding the availability of log data once a call to posix_log_write() is initiated. The working group envisioned the following timeline for the lifetime of an event that is logged via the posix_log_write() function. The timeline would be essentially the same for events logged by the kernel. As discussed later, some of the steps in this timeline may not occur in the indicated order, if at all.

An application calls posix_log_write() — directly, through posix_log_printf(), or through an application- or implementation-defined interface.
The implementation decides whether the caller has permission to log the indicated event record. If not, the call to posix_log_write() fails with EPERM. The implementation may do additional screening at this point. (For example, the implementation may be configured to reject all events with a severity of LOG_DEBUG.) If the implementation decides to reject the event record on such grounds at this point, the call to posix_log_write() fails with ECANCELED.
The implementation captures the event record in temporary storage – for example, in a memory buffer in the kernel or in an event-logging daemon.
The implementation may perform some sort of implementation-defined notification at this point – e.g., if immediate notification is crucial. (The standard is silent regarding this step, since it was felt that real-time notification about urgent events is not within the scope of event logging. Except as otherwise noted, the term “notification” refers to a notification that is sent in response to a notification request that was registered via the posix_log_notify_add() function. See step 8.)
The posix_log_write() call returns zero (success).
The implementation may further screen the event record at this point – for example, to screen out LOG_DEBUG records or to eliminate duplicate event records. The record may be discarded at this point even though the call to posix_log_write() has succeeded.
The implementation writes the event record to long-term storage, such as a disk file.
The implementations sends out notifications to processes that have registered via posix_log_notify_add() to be notified when this type of event is logged.
The event record resides in the system log until a log-maintenance activity removes it.

Regarding this timeline, the standard states:

If the event record is otherwise valid, but is rejected in step 2, posix_log_write() shall return EPERM or ECANCELED.
The implementation shall not send notifications until the event record is available for seeking by posix_log_seek() and reading by posix_log_read(). If the implementation is able to satisfy calls to posix_log_seek() and posix_log_read() before the record is written to long-term storage, then step 8 may precede step 7.

Beyond this, any guarantees about the integrity and/or availability of log data are up to the implementation.

B.20.5.1 Successful Write May Not Imply Successful Read

It was felt that posix_log_write() should complete as quickly as possible, to minimize event-logging overhead in the calling process. Therefore, it was felt that completion of posix_log_write() should not have to wait for completion of the write to long-term storage (step 7) or delivery of associated notifications (step 8). It was felt that any screening done by posix_log_write() (step 2) should be very quick. Additional screening (step 6) could be deferred until after completion of posix_log_write().

Note that even if posix_log_write() succeeds, the associated event record may never become available for reading. Here’s why:

a. The implementation is permitted to drop the record at step 6.

b. The event record may be lost from temporary storage before it is written to long-term storage – for example, if the incoming event rate is so high that temporary storage overflows.

c. The write to long-term storage may fail – for example, if there is no more room in the log file’s filesystem.

d. A log maintenance activity may delete the record from the log before the record is ever read.

Items (b) and (c) above reflect the inevitability of limitations on memory and disk space, respectively. (It was felt, however, that once (c) is detected, the implementation should return ENOSPC on subsequent posix_log_write() calls until space is once again made available.)

Items (a) and (d) were more controversial. They go against the generally accepted philosophy of “log everything and sort it out later.” Item (a) is also a calculated breach of the implied promise that a successfully written record can be subsequently read. On the other hand, such filtering may minimize the (less predictable) loss of data due to items (b) and (c). In any case, it was felt that the filtering implied by items (a) and (d) should be well documented, and configurable by the log administrator.

All of the above notwithstanding, the implementation is free to delay completion of posix_log_write() until after the event has been written to long-term storage and/or notifications have been sent. (But the performance impact should be well understood.)

B.20.5.2 Assignment of Record ID

The standard requires that, within an event log, records have ascending record IDs. In particular, kernel events are not allowed to have a different record-ID sequence from that of application-generated events in the same log.

It is not required that there be no gaps in the sequence of record IDs. Therefore, the implementation is free to assign a record ID to a record that may later be dropped or deleted. It was felt that the range of possible event IDs is huge enough to allow a certain amount of extravagance in this area.

The record ID must be assigned before notifications associated with this event are delivered (since the notification may include the record ID, and also because record ID is a query criterion), and before the record is first sought (via posix_log_seek()) or read.

B.20.5.3 Logging Events at System Shutdown

Events that occur at the time of an unexpected system shutdown are among the most interesting and useful to log. Although the standard is silent on this subject, the working group felt that events logged to temporary storage (but not to long-term storage) just before a system shutdown should be written, if possible, to the system log when the system is rebooted.

B.20.6 Log Maintenance

The standard does not specify how or when the system log is to be archived, compacted, or otherwise modified for maintenance purposes. However, it was recognized that such activities are commonplace, and it was felt that applications should not have to abandon access to the system log when these activities occur. Section 20.5 describes a mechanism for notifying applications when such a maintenance activity starts and ends, so that the applications can resume access to the log once the activity is completed.

The system log is “always available to accept event records.” It was felt that the implementation should not reject calls to posix_log_write() simply because they happen to occur during log maintenance activities. The implementation should either buffer these new records or suspend completion of posix_log_write() calls until maintenance is complete.

B.20.7 Performance

Performance of the event logging system was a concern primarily in the following areas:

posix_log_write(): As discussed in Section B.20.5.1, it was felt that the facility (application or kernel) should incur a minimum of overhead when logging an event.
Notification: It was felt by some that the notification feature, when used with queries of unlimited complexity, could place an inappropriate burden on system resources, or could lead to the notification system falling hopelessly behind the rest of the event logging system. This led to the concept of limited queries, which are discussed in Section B.20.18 of this appendix

As discussed in Section B.20.15, timely (as opposed to efficient) delivery of notifications was not deemed to be a vital performance consideration.

B.20.8 Why Not Just syslog?

The syslog event-logging system, used with many UNIX systems, was considered as a basis for POSIX event logging. For the most part it was rejected, for the following reasons:

syslog logs only textual messages with arbitrary formats. As a result, automated analysis of syslog output is typically impractical.
It discards important information, such as an event’s facility (source) and severity, before the event record is ever logged.
It encourages the use of multiple log files as a method for classifying events. Classifications are therefore arbitrary and static, and related events are often sent to different log files. The results often include a loss of important context when analyzing events, and difficulty in keeping control of the various log files as they grow.
Its notification mechanisms (e.g., sending mail to operators or administrators) are limited.

Other event-logging implementations exist that have overcome these shortcomings. Some of these implementations support the syslog() function for backward compatibility, but feed the syslog()-generated messages into a more flexible logging system.

In view of the widespread use of syslog, it was considered important to be able to implement syslog’s primary features using the POSIX event-logging interface. For example, the posix_log_printf() function supports syslog()’s printf-like formatting capability for text-based event records. The posix_log_memtostr() function enables a simple, strictly conforming program to produce a textual version of a POSIX event log. Functions like posix_log_seek() and posix_log_query_match() make it relatively easy to classify events and/or focus only on events of interest. The posix_log_notify_add() feature enables the creation of programs that watch for new events and take appropriate actions as they occur.

The standard’s set of severity levels is taken directly from syslog, and compatibility with syslog here was felt to be important. It was generally felt that eight levels of severity should be enough for anybody, although the implementation is free to define additional severities.

The standard also accepts syslog’s set of facilities, again largely for compatibility with syslog. (While different implementations of syslog tend to use the same set of severity levels, there is much less agreement on the set of facilities. The set specified in the standard is intended to be a common subset.) It is fully expected that implementations and/or applications will define additional facilities. The set inherited from syslog was not felt to be adequate, but there was general consensus as to the futility of trying to define a complete set that would be widely accepted.

B.20.9 Data Definitions

This section of the rationale discusses Section 20.2 of the normative text, with the following exceptions:

Severity codes are discussed in Section B.20.8.
Query objects are discussed in Section B.20.18.

B.20.9.1 Facility Codes

The data type posix_log_facility_t is an opaque type that is not an array type. (Array types were disallowed for this and some other types because of the complexities associated with passing arrays as function arguments in C.) Existing implementations typically use either character strings or integers as facility codes. The posix_log_factostr() and posix_log_strtofac() functions were included in recognition of the fact that a facility’s code may not be the same as its name.

Integer codes have the advantage of compactness and simplicity, but have the disadvantage that different systems may assign different numbers to the same facility. For example, the Volume Manager may be facility number 55 on system A, but number 59 on system B. This could create problems when analyzing system A’s event log on system B. One approach to this problem is to make the facility’s code a function of its name – for example, a hash code or checksum. It was felt that the solution to this problem is implementation- or installation-dependent, and is therefore beyond the scope of the standard.

Note that although posix_log_facility_t cannot be an array type, it can be a struct whose only member is an array – for example:

typedef struct {

char fac_name[20];

} posix_log_facility_t;

B.20.10 Data Formats

The standard is silent about the format of the “variable-data” portion of an event record, with one exception: the POSIX_LOG_STRING format code is provided for the common case where the variable-data portion is a character string. A variety of other formats were discussed, but none were considered suitable for inclusion in the standard.

B.20.11 Log-Entry Object

The posix_log_entry struct contains those attributes that are included in every event record. There was much discussion as to what sorts of attributes should be included in this standard set. The attributes that were chosen meet most or all of the following criteria:

typically useful and/or occasionally vital for event analysis
easily captured by the implementation if not explicitly supplied by the caller (e.g., user ID, process ID, thread ID)
compact. (Most members of the posix_log_entry struct are integral types.)

B.20.11.1 Standard Attributes

The record ID (log_recid member) is intended to uniquely identify a particular instance of a particular kind of event. It can be used to locate the event record within the log (e.g., using posix_log_seek()), or to indicate the record’s order in the log relative to other records.

The size and format attributes (log_size and log_format) specify the size and format of the variable-length data portion.

The facility and event type (log_facility and log_event_type) are intended to uniquely identify a type of event. Different facilities can use the same event-type code for completely different types of events. The event type can have a variety of uses:

Some implementations use numeric event types as a sort of shorthand. For example, an experienced technical-support person may know that for a product whose facility code is X, an event type of 4008 means the server has died.
Numeric event types can also be used to locate explanations of events (e.g., a list of event descriptions, sorted by event type, in a user’s manual).
Some implementations can derive the detailed format of the event record from the event type. (The event type is used to find a “template” that describes the meaning of the data in the event record.)

The user ID (log_uid), process ID (log_pid), and time stamp (log_time) were widely viewed as useful or even essential.

The group ID (log_gid), process group ID (log_pgrp), thread ID (log_thread), and processor ID (log_processor) were viewed more critically; but it was felt that these attributes could be very helpful in diagnosing certain types of problems. In any case, they are compact and easy for the implementation to capture.

The log_processor member was originally of type int, and was called log_cpu. However, with the increasing variety of multiprocessor and/or clustered architectures, and the increasing tendency to partition processor pools and other resources into multiple virtual systems, it was felt that the implementation should be free to express a processor's ID as something other than an integer. For similar reasons, some objected to the term “cpu” as shorthand for “processor.” (For many multiprocessor systems, no one processor is “central.”)

The log_flags member was introduced to support the POSIX_LOG_TRUNCATE flag, and to accommodate other implementation- or application-defined flags. Other flags were considered for inclusion in the standard, but rejected or moved to other parts of event record. (For example, POSIX_LOG_STRING started out as an event type, was later made a flag, and was later made a format when the log_format member was introduced.)

B.20.11.2 Non-standard Attributes

Attributes that were considered for inclusion in the log-entry structure, but rejected, include:

· caller’s source file name and line number (rejected because of space considerations, and because this information can typically be inferred from other information, such as the facility and event type)

· software version number, or similar information about the facility (rejected because this information can typically be inferred from the facility code and time stamp, given a log of software-installation and -deinstallation events)

· host ID. This was rejected because a POSIX event log accumulates events only for the current system, and so this value would be constant throughout the event log. It was felt by some that such a field might be useful when merging event logs from multiple related systems. However, this anticipates a particular implementation for merging of event logs; and does not address other issues such as duplicate record IDs in the merged log, and inconsistency among facility codes, user IDs, and so on. In general, it was felt that merging of event logs from multiple systems is beyond the scope of this standard. It was also observed that the definition of what constitutes a “host” or “system” is becoming increasingly slippery.

The implementation is free to add additional attributes to the log-entry structure. It is expected that the implementation would support such attributes in the posix_log_memtostr() function, and would permit their use in query expressions.

The implementation may also allow the packaging of non-standard attributes in the variable portion of the event record. Depending on the implementation, such attributes might still be permitted in query expressions. The standard is silent on this subject.

B.20.12 Write to the Log

There were two schools of thought on the handling of a call to posix_log_write() where the length of the variable data exceeds {POSIX_LOG_ENTRY_MAXLEN}. Some thought that the call should fail, to avoid logging corrupted (truncated) data. Some thought that the call should succeed with truncated data, to avoid losing the entire record. The standard’s definition of the {POSIX_LOG_TRUNCATE} flag allows this issue to be decided by the implementation. There was little support for allowing different behaviors for different applications on the same implementation.

In an event record that has a format of {POSIX_LOG_STRING} and has the {POSIX_LOG_TRUNCATE} flag set, the character string is still guaranteed to be null-terminated. It was felt that the additional burden this placed on the implementation of posix_log_write() was minor compared to the benefit to application programs.

There is no function to write a record to the system log (or any log) by specifying a posix_log_entry struct and optional data buffer. As a result, there is no way for a strictly conforming program to copy all or part of an event log to another log. This capability, if desired, was viewed as a function of the underlying implementation’s log administration duties.

B.20.13 Open an Event Log for Read Access

A log descriptor (posix_logd_t) could very well incorporate a file descriptor. It might even be a file descriptor. There was general consensus, however, against the notion that a log descriptor has to be a file descriptor. Hence the opaque posix_logd_t type, and a per-process limit on log descriptors ({POSIX_LOG_OPEN_MAX}) that is distinct from the per-process limit on file descriptors.

In early drafts of the standard, the posix_log_open() function included a posix_log_query_t argument. The intent was that a sequential read of the log using the resulting log descriptor would yield only the records that match the query object. This idea was eventually rejected in favor of the current, more flexible, version of posix_log_seek(), which could be used to provide the same effect.

B.20.14 Read from an Event Log

The working group discussed permitting a NULL value for the entry parameter of posix_log_read(), for use when the read is used only to skip to the next event record. This was not considered particularly useful, and in any case the implementation must read at least the event’s log_size attribute in order to determine where to find the next record.

B.20.15 Notify Process of Availability of System Log Data

B.20.15.1 Purpose of Notifications

Due to existing practice there was a need for a notification interface for event logging. Notification eliminates the need for an application to poll the log for entries of interest. Also, since notification is available, there is no need for a blocking form of posix_log_read().

There was discussion of two fundamentally different reasons for posting notifications:

real-time notification requiring a real-time response (e.g., “engine number 2 has shut down” in an avionics system)
informing an event-monitoring or log-monitoring application that a record of interest has been logged to the system log

It was agreed that item 2 reflects the intent of notification in the POSIX event logging system; item 1 is not within the generally accepted scope of event logging. From this arose the consensus that a notification should not be sent until the corresponding record can be accessed by posix_log_seek() and posix_log_read().

B.20.15.2 Notification Mechanism

The specified interface uses the POSIX sigevent mechanism, which allows asynchronous notification of newly logged information either via delivery of a signal or the initiation of a new thread. In threads-based notification, the new thread’s start_routine is passed a sigval object. In signal-based notification, the signal-catching function is passed a siginfo_t object (which contains a sigval object). The value of the object passed can be used to determine which notification request and/or which new event record triggered the notification.

There was much discussion as to what sort of data should be included with the notification. Three options were considered:

1. Pass the value that was found in the sigevent object’s sigev_value member at the time of the call to posix_log_notify_add(). (For purposes of this discussion, we call this the “sigevent payload.”)

2. Pass the new event’s record ID.

3. Pass the new event’s entire posix_log_entry struct.

Option 1 is consistent with other uses of the sigevent structure in functions such as mq_notify(), lio_listio(), and timer_create(). It also allows the application to easily determine (by examining the sigevent payload) which notification request yielded the current notification. The new event can be located by using the posix_log_seek() function with the same query object that was used in the notification request.

Option 2 enables the application to precisely locate the new event n (via posix_log_seek() with a query expression of “recid=n”).

As to option 3, it was felt that a sigval was intended to be a small object – too small to hold a posix_log_entry structure. In some UNIX implementations, a siginfo_t object is large enough to hold a posix_log_entry structure, but it seemed unwise to make this assumption about all UNIX implementations. Several other mechanisms for delivering the posix_log_entry object were considered, but in the end, option 3 was rejected because options 1 and 2 seemed sufficient.

If a sigval contains a single value, then it cannot contain both the sigevent payload and the record ID. The implementation then has the following choices:

a. Make the programmer choose between the record ID and the sigevent payload.

b. Support delivery of both record ID and sigevent payload to a signal handler (e.g., store the record ID somewhere in the siginfo_t object), but not to a thread start routine.

c. Implement an extension to the sigval union that would allow it to contain a record ID as well as the standard value (sival_int or sival_ptr) in non-overlapping storage.

A desire to allow the implementation reasonable leeway in this area is reflected in the POSIX_LOG_SEND_RECID and POSIX_LOG_SEND_SIGVAL flags and the posix_log_siginfo_recid() and posix_log_sigval_recid() functions.

B.20.15.3 Other Notification Issues

The standard does not guarantee that SI_EVLOG is distinct from other possible values of si_code in the siginfo_t object that is passed to the signal-catching function. This allows the implementation to build the event-notification mechanism upon an existing feature such as message queues or the sigqueue() function.

Notifications will not be inherited across fork(). The general precedent is that signal-generating things (like timers) don't get inherited.

B.20.16 Reposition the Read Pointer

The posix_log_seek() function was motivated by the desire to have a filtered view of an event log. To read all the records from the system log that match a particular set of filter criteria (expressed in terms of the query expression qexpr), you can do the following:

posix_log_query_t q;

posix_logd_t logdes;

int result;

struct posix_log_entry entry;

char data[POSIX_LOG_ENTRY_MAXLEN];

(void) posix_log_open(&logdes, NULL);

(void) posix_log_query_create(qexpr, POSIX_LOG_PRPS_SEEK, &q, NULL, 0);

do {

result = posix_log_seek(logdes, &q, POSIX_LOG_SEEK_FORWARD);

if (result == 0) {

/* Found a record that matches q. Read it. */

(void) posix_log_read(logdes, &entry, data,

POSIX_LOG_ENTRY_MAXLEN);

/* Process the record. */

...

}

} while (result == 0);

To move the read pointer back one record, do posix_log_seek(logdes, NULL, POSIX_LOG_SEEK_BACKWARD). To skip forward past the current record, just read the record with posix_log_read().

B.20.17 Compare Event Record Severities

The standard requires posix_log_severity_t to be an integral type, but does not specify whether a higher severity level should be numerically greater or less than a lower severity level. The posix_log_severity_compare() interface permits applications to compare severities in a portable fashion.

B.20.18 Queries = Event Filters

The origin of the term “query” as applied to POSIX event logging is obscure, but presumably relates to the SELECT query of the SQL language for relational database management systems. An event-logging query has the same object: selecting a set of (event) records based on the values of their attributes. In some event-logging implementations, the term “filter” is used.

Queries were motivated by existing practice in a variety of implementations, including syslog, which directs each event record to its desired destination based on the value of its facility and severity attributes.

B.20.18.1 Query Criteria

Motivated by syslog, early drafts of the standard limited query criteria to the facility and severity attributes. The case was made for other criteria, such as record ID and time stamp. Eventually it was decided to allow all attributes in the posix_log_entry struct to be used as query criteria. This was more in line with event-logging implementations on enterprise-class UNIX systems. The next logical extension was to allow the variable portion of the event record, when it comprised a character string (format = POSIX_LOG_STRING), to be used as a query criterion.

The move toward allowing arbitrarily complex queries, however, led to a concern about the likely performance of the implementation’s notification subsystem. This concern was the motivation for limited queries, discussed later in this section.

B.20.18.2 Textual Query Expressions

When there were only two query criteria (facility and severity), queries were constructed and examined using specialized functions, analogous to the pthread_attr_set* and pthread_attr_get* functions of POSIX threads. The desire to support more criteria led to a desire for a more extensible mechanism for expressing them.

Existing practice suggested the use of textual query expressions, as specified in this standard. Here are some of the advantages of this approach:

Query expressions are human-readable. In particular, they can be easily included in command lines to facilitate ad hoc queries, or in a dictionary of predefined queries.
This approach allows queries based on attributes other than facility and severity.
Query expressions are flexible. A query expression can be formulated to implement any combination of criteria.
This approach is easily extended to support new implementation- or application-defined attributes.
Textual queries have applications beyond those specified in this standard (notification and seeking). For example, they can be used in configuration files to specify what kinds of events are to be eliminated from the event stream or the event log -- e.g., to conserve space or to implement security measures.

Most event attributes are integers. And given the posix_log_memtostr() function, all attribute values can be expressed as character strings. As a result, the standard’s query grammar is simple: tokens are C-language identifiers, integer constants, string literals, operators, and parentheses. It was felt that parsing such a grammar places a relatively small burden on the implementation. In one sample implementation, the lex-based tokenizer is about 100 lines, and the yacc grammar is about 130 lines.

Table 20-9, “Required Operations on Standard Attributes,” is based on the following guidelines:

For integer attributes, the implementation needs to accommodate comparison with integers.
For attributes that are not guaranteed to be integers (facility, thread ID, processor ID), the implementation needs to accommodate comparison with the character-string equivalents.
For attributes that have standard symbolic values (e.g., POSIX_LOG_BINARY, LOG_ALERT), the implementation needs to accommodate those symbolic values.
For attributes that are commonly expressed both as integers and as character strings, and the integer-to-string mapping is clear (e.g., user ID, group ID), the implementation needs to accommodate both types of values.

The implementation is free to “add rows to the table” – e.g., to accommodate new event attributes, to provide different ways of referencing the standard attributes (e.g., “age < 7” as an alternative to “time > 987654321”), or to allow alternate forms of the standard attributes (e.g., “facility=29”). The implementation is also free to support new (backward-compatible) types of tokens – for example, real numbers, pathnames, or Internet addresses. However, many such values can already be conveniently treated as character strings – for example:

host = "138.98.18.1"

The apostrophe (single quote) was considered for use as the delimiter for string literals – for example, "facility = 'Volume Manager'". The double quote was used for consistency with the C language, and also because query expressions are expected to show up more often in command lines – e.g.,

listevents –q 'facility = "Volume Manager"'

-- than in C source.

There was no perceived need to support arithmetic expressions or array references in the grammar.

The regular-expression operators "~" and "!~" were included for compatibility with existing practice: the grep command is often used to perform regular-expression-based searches of textual event logs.

The "contains" operator was included for performance reasons — strstr() can be expected to be significantly more efficient than regexec() — and to simplify searches for strings that contain characters that might be interpreted specially by regcomp().

B.20.18.3 Sample Query Expressions

Here are examples of valid query expressions.

Select all records with a user ID other than root:

uid != "root"

or (assuming root's user ID is zero)

uid != 0

Select all records with a severity of LOG_WARNING or LOG_NOTICE:

severity == WARNING || severity == NOTICE

Select all records with a facility of LOG_DAEMON and a group ID of daemon or bin:

facility == DAEMON && (gid == "daemon" || gid == "bin")

Select all records whose variable data is a string that contains the substring “sendmail”:

data contains "sendmail"

Select all records whose variable data is a string that begins with the substring “sendmail”:

data ~ "^sendmail"

Select all records logged between 6:00 pm Christmas 2000 and noon New Year's Day 2001 in the Pacific Time zone:

time >= 977796000 && time <= 978379200

Select all records whose variable-data portions have been truncated:

flags & POSIX_LOG_TRUNCATE

B.20.18.4 Time Values in Query Expressions

Standard time values are expressed as time_t values (seconds since the Epoch – e.g., 983478600). Since most humans have a hard time converting “12:30 pm March 1, 2001 PST” to 983478600, there was some support for requiring the implementation to support time values as character strings – e.g., “2001/3/1 12:30:00”. However, this was a can of worms that the working group did not want to open. “2001/3/1 12:30:00” is easy enough to evaluate (assuming the local time zone is intended), but what about alternate forms such as “Mar 1 12:30” and relative times such as “last Thursday”, “3 hours ago”, and “yesterday noon EDT”? The implementation, of course, is free to support textual time expressions as an extension. (Suggestion: Make the time expression a quoted string, to preserve the simplicity of the overall query syntax.)

B.20.18.5 Limited Queries

As mentioned previously, some implementations may need to minimize the cost of posting notifications, and therefore the cost of evaluating notification-related queries. For such implementations, the concept of limited queries was created. A limited query that is as simple as the standard allows can be evaluated using a statement such as the following. (All indicated fields of the query object are hypothetical. Each test_xxx field of the query object is 1 if the xxx attribute is mentioned in the query expression.)

struct posix_log_entry *e;

posix_log_query_t *q;

...

if ((q->test_recid && e->log_recid != q->recid)

|| (q->test_event_type && e->log_event_type != q->event_type)

|| (q->test_facility && e->log_facility != q->facility)

|| (q->test_severity && e->log_severity > q->severity)

|| (q->test_uid && e->log_uid != q->uid)) {

/* Mismatch. */

...

} else {

/* Found a match. Send notification. */

...

}

By contrast, evaluating an arbitrarily complex expression would typically involve traversing an expression tree, and may involve table lookups or other relatively expensive operations.

An implementation is free to make limited queries less restrictive -- i.e., add more rows and/or operations to Table 20-10, “Required Operations in Limited Queries.” It is expected that if the implementation places no restrictions at all on the type of query that can be used for notification, it will define POSIX_LOG_PRPS_NOTIFY to be equal to POSIX_LOG_PRPS_GENERAL (or at least to include the POSIX_LOG_PRPS_GENERAL flag).

Although a limited query and a general query are likely to have different internal representations, the posix_log_query_t type encapsulates both. Moreover, the syntax and semantics of a limited query expression are the same as for a general query expression. It is up to the implementation to decide whether a query expression represents a valid limited query. These factors reflect the fact that limited queries are a special case of general queries, created solely for performance reasons.

B.20.18.6 posix_log_query_match()

The posix_log_query_match() function was added primarily in response to the addition of limited queries. Suppose than an application’s notification criteria can be expressed using the query expression q, but q does not qualify as a limited query. The application can register for notification using a simpler, less restrictive query expression (or none at all). When a notification arrives, the application can test the associated event record using the posix_log_query_match() function with the query expression q, to determine whether the record is really of interest.

The POSIX_LOG_SEEK_FORWARD and POSIX_LOG_SEEK_FIRST options of posix_log_seek() could be implemented using repeated calls to posix_log_read() and posix_log_query_match().

B.20.19 String Equivalents of Event Attributes

There was general consensus that it should be possible to write a strictly conforming program that reads an event log and prints at least the standard event attributes in human-readable form; hence the posix_log_memtostr() function.

Implementers should keep in mind that the string produced by posix_log_memtostr() may be used in a variety of ways. In particular, it may be used in a query expression. So if log_uid has a value of zero, a good string equivalent is “root”, not “user ID = root”. A query expressions such as

uid = "bill"

makes a lot more sense than

uid = "user ID = bill"

The posix_log_factostr() and posix_log_strtofac() functions were included with an implementation-defined “facility registry” in mind. Such a registry would allow an application to obtain a facility name from a facility code, and vice-versa. A facility registry might also include other information about each facility – for example, which user IDs are permitted to log events with that facility code.

B.20.20 Standard Event Types

The standard defines only five event types, all associated with facility LOG_LOGMGMT. The implementation is required to log POSIX_LOG_MGMT_STARTMAINT and POSIX_LOG_MGMT_ENDMAINT events at the start and end of maintenance activities, respectively. The other events can be logged at the implementation’s discretion, if at all.

It was suggested that this standard should define event types for other types of events that might be of interest to a system administrator or operator -- for example, a file system becoming full, a hardware device failing, a service becoming available, or a related host or node going on-line or off-line. With such a list of standard event types, it would be theoretically possible to write a portable program that provides a wide variety of system-administration services on all sorts of architectures.

This suggestion was rejected for the following reasons:

It would significantly increase the scope of the standard.
Due to the diversity of POSIX systems and administration strategies, there was general consensus as to the futility of trying to define a complete set of event types that would be widely accepted.
Each event type would presumably have to be associated with a standard facility; but as mentioned before, the standard set of facilities (which was inherited from syslog) is not intended to be complete.
Requiring the system-administration software to detect and log such events in accordance with the standard could greatly increase the cost of implementing the standard.

B.20.21 Revision History

Revision	Date	Author	Description
0	3/7/01	unknown	Jim Keniston’s translation to Word of the troff document provided by Steve Watt on 10/20/00
1	3/27/01	Jim Keniston	Greatly expanded and largely rewritten to reflect discussions and decisions since October 2000.
2	3/30/01	Jim Keniston	Reflects Larry Kessler’s review. Added revision history.
3	7/27/01	Jim Keniston	Reflects changes in Event Logging draft 12, especially regarding truncation of strings and delivery of record ID to notification handler.
4	8/31/01	Jim Keniston	Reflects changes and discussions from the August 2001 meeting of the working group. Affected topics: circular event logs (no longer mentioned), log_processor (was log_cpu), regular expressions, portable event logs.
5	9/19/01	Jim Keniston	Changed all PXLOG_ symbols to POSIX_LOG_, to comply more strictly with PASC’s guidelines regarding Preserving Backwards Compatibility. Added section on Standard Event Types.