Editor’s Note: MapR products and solutions sold prior to the acquisition of such assets by Hewlett Packard Enterprise Company in 2019 may have older product names and model numbers that differ from current solutions. For information about current offerings, which are now part of HPE Ezmeral Data Fabric, please visit https://www.hpe.com/us/en/software/data-fabric.html
Original Post Information:
"authorDisplayName": "Tugdual Grall",
"publish": "2016-03-10T08:00:00.000Z",
"tags": "streaming"
MapR Event Store is a distributed messaging system for streaming event data at scale. It is integrated into the MapR converged platform. MapR Event Store uses the Apache Kafka API, so if you’re already familiar with Kafka, you’ll find it particularly easy to get started with MapR Event Store.
Although MapR Event Store generally uses the Apache Kafka programming model, there are a few key differences. For instance, there is a new kind of object in the MapR file-system called, appropriately enough, a stream. Each stream can handle a huge number of topics, and you can have many streams in a single cluster. Policies such as time-to-live or ACEs (Access Control Expressions) can be set at the stream level for convenient management of many topics together. If you already have Kafka applications, it’s easy to migrate them over to MapR Event Store. In this blog post we describe how to run a simple application we originally wrote for Kafka using MapR Event Store instead.
Sample Programs
As mentioned above, MapR Event Store uses Kafka API 0.9.0, which means it is possible to reuse the same application with minor changes. Before diving into a concrete example, let’s take a look at what has to be changed:
The topic names change from "
topic-name
" to "/stream-name:topic-name
", as MapR organizes the topics in streams for management reasons (security, TTL, etc.).The producer and consumer configuration parameters that are not used by MapR Event Store are automatically ignored, so no change here.
The producer and consumer applications are using jars from MapR rather than the Apache Kafka jars. You can find a complete application on the Sample Programs for MapR Event Store page. It’s a simple copy that includes minor changes of the Sample Programs for Kafka 0.9 API project.
Prerequisites
You will need basic Java programming skills, as well as access to:
- A running MapR Cluster
- Apache Maven 3.0 or later
- Git to clone the https://github.com/mapr-demos/mapr-streams-sample-programs repository
Running Your First MapR Event Store Application
Step 1: Create the stream
A stream is a collection of topics that you can manage together by:
Setting security policies that apply to all topics in that stream
Setting a default number of partitions for each new topic that is created in the stream
Setting a time-to-live for messages in every topic in the stream. Run the following command, as
mapr
user, on your MapR cluster:
$ maprcli stream create -path /sample-stream
By default, the produce and consume topic permissions are defaulted to the creator of the streams — the Unix user you are using to run the maprcli command. It is possible to configure the permission by editing the streams. For example, to make all of the topics available to anybody (public permission), you can run the following command:
$ maprcli stream edit -path /sample-stream -produceperm p -consumeperm p -topicperm p
Step 2: Create the topics
We need two topics for the example program, which we can be created using maprcli
:
$ maprcli stream topic create -path /sample-stream -topic fast-messages $ maprcli stream topic create -path /sample-stream -topic summary-markers
These topics can be listed using the following command:
$ maprcli stream topic list -path /sample-stream topic partitions logicalsize consumers maxlag physicalsize fast-messages 1 0 0 0 0 summary-markers 1 0 0 0 0
Note that the program will automatically create the topic if it does not already exist. For your applications, you should decide whether it is better to allow programs to automatically create topics simply by virtue of having mentioned them or whether it is better to strictly control which topics exist.
Step 3: Compile and package the example programs
Go back to the directory where you have the example programs and build an example program.
$ cd .. $ mvn package ...
The project creates a jar with all the external dependencies ( ./target/mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar
).
Note that you can build the project with the Apache Kafka dependencies as long as you do not package them into your application when you run and deploy it. This example has a dependency on the MapR Event Store client instead, which can be found in the mapr.com
maven repository.
<repositories> <repository> <id>mapr-maven</id> <url>http://repository.mapr.com/maven</url> <releases><enabled>true</enabled></releases> <snapshots><enabled>false</enabled></snapshots> </repository> </repositories> ... <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.9.0.0-mapr-1602</version> <scope>provided</scope> </dependency> ...
Step 4: Run the example producer
You can install the MapR Client and run the application locally or copy the jar file onto your cluster (any node). If you are installing the MapR Client, be sure you also install the MapR Kafka package using the following command on CentOS/RHEL:
`yum install mapr-kafka`
$ scp ./target/mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar mapr@<YOUR_MAPR_CLUSTER>:/home/mapr
The producer will send a large number of messages to /sample-stream:fast-messages
along with occasional messages to /sample-stream:summary-markers
. Since there isn't any consumer running yet, nobody will receive the messages.
If you compare this with the Kafka example used to build this application, the topic name is the only change to the code.
Any MapR Event Store application will need the MapR Client libraries. One way to make these libraries available is to add them to the application classpath using the /opt/mapr/bin/mapr classpath
command. For example:
$ java -cp $(mapr classpath):./mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.mapr.examples.Run producer Sent msg number 0 Sent msg number 1000 ... Sent msg number 998000 Sent msg number 999000
The only important difference here between an Apache Kafka application and MapR Event Store application is that the client libraries are different. This causes the MapR Producer to connect to the MapR cluster to post the messages, and not to a Kafka broker.
Step 5: Start the example consumer
In another window, you can run the consumer using the following command:
$ java -cp $(mapr classpath):./mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.mapr.examples.Run consumer 1 messages received in period, latency(min, max, avg, 99%) = 20352, 20479, 20416.0, 20479 (ms) 1 messages received overall, latency(min, max, avg, 99%) = 20352, 20479, 20416.0, 20479 (ms) 1000 messages received in period, latency(min, max, avg, 99%) = 19840, 20095, 19968.3, 20095 (ms) 1001 messages received overall, latency(min, max, avg, 99%) = 19840, 20479, 19968.7, 20095 (ms) ... 1000 messages received in period, latency(min, max, avg, 99%) = 12032, 12159, 12119.4, 12159 (ms) <998001 1000="" 12095="" 19583="" 999001="" messages="" received="" overall,="" latency(min,="" max,="" avg,="" 99%)="12032," 20479,="" 15073.9,="" (ms)="" in="" period,="" 12095,="" 12064.0,="" 15070.9,="" (ms)<="" pre="">
Note that there is a latency listed in the summaries for the message batches. This is because the consumer wasn't running when the messages were sent to MapR Event Store, and thus it is only getting them much later, long after they were sent.
Monitoring your topics
At any time you can, use the maprcli tool to get some information about the topic. For example:
$ maprcli stream topic info -path /sample-stream -topic fast-messages -json
The -json
option is used to get the topic information as a JSON document.
Cleaning up
When you are done playing, you can delete the stream and all associated topics using the following command:
$ maprcli stream delete -path /sample-stream
Conclusion
Using this example built from an Apache Kafka application, you have learned how to write, deploy, and run your first MapR Event Store application. As you can see, the application code is really similar, and only a few changes need to be made (such as changing the topic names). This means it is possible to easily deploy your Kafka applications on MapR and reap the benefits of all the features of MapR Event Store, such as advanced security, geographically distributed deployment, very large number of topics, and much more. This also means that you can immediately use all of your Apache Kafka skills on a MapR deployment.