François Donzé

Why is Redfish different from other REST APIs - Part 2

February 22, 2024

Introduction

In part 1 of this series, I presented some of the fundamentals of the Redfish® standard published by the Distributed Management Task Force (DMTF) consortium in 2015. Among them:

The separation of the protocol from the data modeling
A self-describing model
OEM extensions.

Here in part 2, you will find other unique properties contributing to the massive adoption of this "hypermedia API" by equipment manufacturers and to the replacement of the aging Intelligent Platform Management Interface (IPMI).

The Redfish concept of "Actions"

Redfish resources support the GET request to retrieve their current state. Modifications or deletions can be performed to certain resources using POST, PUT, PATCH and DELETE basic requests. Nothing exceptional in the world of REST APIs, except perhaps, that it is possible to retrieve the exhaustive list of possible requests on a given resource, by consulting the Allow header of GET request responses. Refer to the Allowed requests paragraph in Part 1.

However, some operations are difficult to model with the above classic HTTP requests. For example, it is impossible to "read" the server's power button to know its status. An other example that cannot be addressed by classic HTTP requests is the "return to factory settings" of a sub-system like a storage controller. This operation requires additional parameters like the preservation (or not) of existing logical volumes.

To address these cases and others, Redfish offers the concept of Actions. These are special POST requests including the operation(s) to be performed and an empty body or a body with parameters. The POST endpoint as well as the parameters and their possible values are described in an Actions{} object contained in the response to a GET. Figure 1 shows the retrieval of the list of possible actions on the ComputerSystem subsystem of a given server, as well as their description. In this specific case, it is possible to perform a single action (#ComputerSystem.Reset) with a parameter (ResetType) which can take several values.

Computer system actions

Figure 1: Computer system actions

Figure 2 shows the graceful restart action of a server with its destination (target URI) and its payload as well as the payload of the response (Success).

System reset action

Figure 2: System reset action

The precise description of possible actions in the main body of GET responses allows Redfish clients to carry out checks to avoid sending erroneous requests, thereby avoiding the creating of unwanted network traffic.

The Redfish event service

The Baseboard Management Controller (BMC) of modern servers communicates with almost all the server's subsystems. This privileged role allows it to be notified of events occurring in the system, such as errors appearing in the memory, in the storage controllers or elsewhere. These events are stored in "log" files. Note that SNMP traps can be triggered if the BMC has been configured beforehand.

SNMP is an aging protocol that, due to its design, can saturate a network during an event storm. The processing of events by Network Management Systems (NMS) can also constitute a bottleneck in terms of CPU load or storage. Furthermore, the security linked to SNMP is often mentioned as insufficient.

An alternative to SNMP is the Redfish Event Service which, is based on the subscription principle. The major advantage of this principle is that events are sorted at the source according to subscription criteria and sent only to subscribers. Thus, the risk of network saturation is reduced. The security of these messages is based on the Transport Layer Security (TLS) protocol used by HTTPS, which is unanimously considered as secure.

Event model

Asynchronous events and error messages are all listed and described in registers (refer to the first part of this article for an introduction to registers). The registers collection is at the URI /redfish/v1/Registries. These objects (events, messages, notifications, etc.) have a common format: RegistryPrefix.Version.Identifier. For example, the message for restarting a server (Figure 2) is described in the Base registry, version 1.17 and with Success as identifier. For more information on this message, simply follow the link /redfish/v1/Registries/Base and read the description related to the identifier.

The exhaustive list of registers that can be used to subscribe to events is returned in the RegistryPrefixes[] array of a GET on /redfish/v1/EventService. Figure 3 shows an example of such a list from an HPE server. Among other things, there are registers containing messages relating to network and storage equipment.

Enumeration of prefixes available for event subscriptions

Figure 3: Enumeration of prefixes available for event subscriptions

Subscription to events is done by a POST request to the standard URI /redfish/v1/EventService/Subscriptions that includes, in its body, the IP address of the listening service and the list of events to send to it.

The listening service is a web service waiting for traffic on port TCP/443 (default) and that will process the received events. The DMTF provides such a service (free of charge) on GitHub to facilitate the learning of this concept and for debugging code.

Figure 4 is an example of subscribing to events generated by storage components in an HPE server. The BMC that receives this subscription must send the events to the IP address contained in the {{EventListener}} variable. The Context property as well as the optional HttpHeaders can be useful to the listening system. OEM properties complement the subscription description, with easy-to-understand properties.

Event subscription

Figure 4: Event subscription

The collection of subscriptions received by the BMC can be found at the URI: /redfish/v1/EventService/Subscriptions (Figure 5).

Subscription collection

Figure 5: Subscription collection

The event service also allows you to easily test the subscriptions by creating a test action to /redfish/v1/EventService/Actions/EventService.SubmitTestEvent with, in its body, the first part of the MessageId property correctly populated so that the test event is sent to the correct system (Figure 6).

Test event

Figure 6: Test event

The Telemetry service

Supervising a server fleet involves retrieving indicators such as the temperature of certain components and the energy consumed by power supplies, CPUs or fans, in order to create metric reports, graphs or generate alerts. The most obvious recovery method is to locate the URI of the desired indicators and retrieve them on demand. There is an alternative to this "pull" type method: a "push" of indicators from the BMC towards subscribers. This alternative is possible thanks to the Redfish telemetry service.

The telemetry entry point is at /redfish/v1/TelemetryService and has the following resources:

Metric definitions: CPU or memory bus usage, power supply consumption, etc. Each definition contains multiple properties such as the indicator type (decimal, integer, percentage), its maximum-minimum values and the URI allowing its value to be retrieved at a given time.
Report definitions: These definitions mainly indicate which metrics are part of a report, when to generate the report and where it will be posted. For example, Figure 7 shows a report aggregating all indicators relating to the power used by the system. This report will be generated periodically and will be posted at the @odata.id URI.
Definitions of triggering actions: Depending on the value or trend (increasing or decreasing) of certain indicators, one or more actions will be triggered, such as recording in a log file, generating an event or requesting the generation of a new telemetry report.

Metric report definition

Figure 7: Metric report definition

The model shown above is both powerful and extremely flexible. It allows you to:

Retrieve indicators individually whenever you want.
Retrieve telemetry reports, potentially aggregating several indicators and following a customizable frequency.
Be informed when an indicator crosses a threshold or leaves a given range following a certain trend.
Subscribe to telemetry reports asynchronously.

The last point above is an event subscription specifying the particular MetricReport format, instead of the default Event format used in the previous section. The subscription must specify the list of reports as indicated in Figure 8.

The telemetry service offers undeniable advantages and flexibility. Unfortunately, to date, only platforms based on Intel® components implement it.

Telemetry event subscription

Figure 8: Telemetry event subscription

Additional components integration

Additional components, such as network cards and storage controllers, are integrated in the Redfish data model transparently to Redfish clients. The retrieval of their state, their configuration and their firmware update is done in a similar manner to the other components of the server. This is possible with the implementation in the BMC and in these components of the Platform Level Data Model (PLDM) protocol.

PLDM, published by the DMTF, standardizes messages between the BMC and server components. Thus, add-on card manufacturers and generic BMCs providers like OpenBMC, no longer have to implement proprietary protocols to communicate with the different subsystems.

Figure 9 explains the communication between a Redfish client and an add-on card. HTTPS requests sent by the client are transformed by the BMC into PLDM messages. These messages are transported by the Management Component Transport Protocol (MCTP) to the component that will respond via the same communication channel. When the dialogue between the BMC and the component is finished, the BMC aggregates the received PLDM messages and responds to the client via HTTPS/JSON.

Offload Redfish processing to device

Figure 9: Offload Redfish processing to device

PLDM for RDE

PLDM specifications are generic and must be supplemented to satisfy specific uses such as a Redfish Device Enablement (RDE). This specification describes a set of specific messages allowing the BMC to communicate with additional components.

But why do I focus on hidden internal server communication protocols? I do so to make certain consequences easier to understand, such as:

If the server is not powered on, the BMC, which is always powered, will not be able to communicate with the add-on components. In other words, for a Redfish client to be able to communicate with those components, they must be powered (server powered on), and must have been discovered by the BMC. It is recommended to test the server state before querying additional components.
If additional component responses contain OEM extensions, they will not mention the server manufacturer, but the component manufacturer.
Additional components generally indicate the Internet links of the schemas used because they rarely have space to store them locally. Consequently, if the Redfish client does not have access to the Internet, it will not be able to consult the schemas used by these additional cards.

PLDM for firmware updates

PLDM also facilitates updates of additional components. The PLDM for firmware update specification, when implemented in the BMC and in additional components, allows the BMC or the BIOS (UEFI) to perform the update. Specific tools from additional component suppliers are therefore no longer necessary to "flash" the component from the operating system.

Thus, the components of a server powered on but without an operating system can be updated by a remote Redfish client. PLDM for firmware updates constitutes real progress much appreciated by system managers and other devOps.

Swordfish integration

Very quickly after the publication of the first version of Redfish in 2015, the SNIA, which develops data standards, created an extension of Redfish dedicated to storage and called Swordfish®. Figure 10 shows the headers of a response to a request on a logical volume. The Link header points to a sub-directory dedicated to Swordfish on the DMTF site. Most storage-related schemes are developed by the Storage Networking Industry Association (SNIA) and hosted by the DMTF. A great example of cooperation between standardization organizations!

Link to Swordfish® URL

Figure 10: Link to Swordfish URL

Security and component integrity

The majority of computer manufacturers have implemented secure production methods that guarantee all the components constituting a server do not contain viruses or other malware when leaving the factory or even when leaving the truck at its final destination. Indeed, a lot can happen during intercontinental transport of electronic goods!

However, this warranty does not necessarily apply to additional components purchased on the Internet or at a local electronics store.

Installing additional cards or installing an operating system constitute potential security holes. To minimize the risks of inserting Trojan horses into servers, component suppliers embed authentication elements, kinds of signatures in the microcode, allowing their authenticity and integrity to be verified. The measurement of the integrity of a component is carried out by the BMC when the system starts. It can also be triggered during the run-time of the system. The protocol used for message exchanges between the BMC and the components is the Secure Protocol and Data Model (SPDM) of the DMTF, transported by MCTP.

The operating system and BMC integrity measurements can be retrieved from the Trusted Platform Module (TPM), a highly secure physical chip located on the motherboard. When the component integrity verification process is enabled, it is possible to define a policy in case of a detected corruption during the system startup: NoPolicy allows the continuation of the startup or HaltOnBoot to halt abruptly system (Figure 11).

SPDM configuration

Figure 11: SPDM configuration

What about the future ?

While the first part of this introduction to Redfish focused on the architectural specifics of the API, this second part goes deeper in the server data model and its smooth integration with additional components using powerful internal communication standards helping the improvement of security and firmware update.

The specificities of the Redfish standard listed in both parts of this introduction provide a solid foundation to ensure a long term stability of the protocol and the computer data model. This stability is necessary for a smooth adoption of this standard by programmers. Moreover, it will facilitate the creation of new models for emerging technologies like the "Compute Express Link" (CXL).

And so, the best is yet to come!

Don't forget to check out some of my other blog posts on the HPE Developer portal to learn more about Redfish tips and tricks.

Why is Redfish different from other REST APIs - Part 2

Introduction

The Redfish concept of "Actions"