Sunday, 7 May 2017

Zeppelin + Scala: Consuming a HTTP endpoint


In our last post, we discussed on how could we execute Spark jobs in Zeppelin and then create nice SQL queries and graphs using the embedded SQLContext (provided along with the usual Spark context).

Today, we will see a way to populate your Spark RDDs/Data-Frames with data retrieved from a HTTP endpoint/REST service. We will focus on the parsing of the resulting JSON response. Additionally, we will also learn how to import additional libraries to Zeppelin.

Friday, 13 January 2017

Fast prototyping with Zeppelin, Spark & Scala


It´s been a while since I wrote anything here (again) but I don´t have much free time nowadays. Currently I´m taking some training on Scala and Spark, which, by the way, brings us here today.

A recipe for quick prototypes for Data Analysis: Scala + Spark + Zeppelin

If you remember well, I wrote some time ago about some personal learning projects I was working into, which basically picked stocks price information from the Web (using Spring Integration) and ran a couple of Spark analysis that were lately displayed in an AngularJS interface.

Nothing complicated at all, but rather verbose and time consuming to set up, specially if you just want to learn the subject.

Sunday, 23 October 2016

How to set up e-mail notifications for Cron


Last week, I found myself setting up some jobs in a Unix environment (Centos 7) for which I wanted to get error email notifications and I ran into some troubles setting up the proper configuration, so I would like to take the chance and share the actions I took in case anyone finds him/herself in the same situation.

Please note that we are going to use Postfix as mail service here!

Initial Crontab setup

This was the setup I was facing, some jobs scheduled that might fail sometimes and one that will always fail as it was badly written:
$crontab -l

30 15 * * * java -jar /path_to_your_jar_app/job.jar
45 13 * * * /path_to_some_script/
45 19 * * * obviously_wrong_command
I wanted to be notified by email of any failure (and as I was using Cron, the expected behavior was to get an email  notification when either process returns a non-zero exit code).

This is an example of an email that should be generated for the above setup:
Message  1:
From cron_user@homepc  Sun Oct 23 19:45:01 2016
From: "(Cron Daemon)" 
Subject: Cron  obviously_wrong_command
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: XDG_SESSION_ID=3289
X-Cron-Env: XDG_RUNTIME_DIR=/run/user/3001
X-Cron-Env: LANG=en_US.UTF-8
X-Cron-Env: MAIL_TO=admin_user_alias
X-Cron-Env: SHELL=/bin/sh
X-Cron-Env: HOME=/home/cron_user
X-Cron-Env: PATH=/usr/bin:/bin
X-Cron-Env: LOGNAME=cron_user
X-Cron-Env: USER=cron_user
Date: Sun, 23 Oct 2016 19:45:01 +0200 (CEST)
Status: RO

/bin/sh: obviously_wrong_command: command not found
Keep reading if you want to get the same setup! 

Sunday, 16 October 2016

Introduction to Spring Cloud Dataflow (I)


This is the first of a series of posts on how to develop data-driven micro-services with Spring Cloud Dataflow (SCDF from now on).  For now, we will see what is the approach proposed by this framework and how to build locally the basic components: Source, Sink and Processor.

Also, if you are familiar with Spring, we will take a look to the already-made components that are available for you to use so you don´t have to reinvent the wheel.

  • What is Spring Cloud DataFlow?
  • Introduction to the API: A simple Producer (Source), Consumer (sink) setup.
  • Prerequisites: Kafka and Zookeeper.
  • Coding a simple Source: HTTP Poller that retrieves live stock prices.
  • Coding a simple Sink that store the results in a file.
  • Writing an additional Batch Source.
  • Summary & Resources.

Sunday, 28 August 2016

How to clean up ElasticSearch with Curator 4.0.6


Today, I would like to share with you a quick introduction to a tool that cleans and maintains your ElasticSearch cluster clean and free from old data: Curator (thanks flopezlasanta for the tip on this!)

This tool, along with ElasticSearch itself, evolves very quickly, so when I was looking for information on the subject and I found this blog entry from 2014, I noticed how much the way of working with the tool has changed. Anyway, kudos to @ragingcomputer!

Installation instructions

Before you install Curator, you need to get the Python package Installer (Pip), so Python is another requirement, Note that if you are running a version of Python newer than 3.4 (version 3) or 2,7 (version 2), Pip comes already installed with Python. More info here.

Note: You need to be super user to perform the installation.

And you are ready to go! Now let´s check the other two files needed to make it work.

Tuesday, 5 July 2016

Docker images creation in a development environment


Sorry, It´s been a while since my last post, a mixture of not having anything worth writing, holidays and some online training that drains most of my time.
Here is something I would like to share that might reduce the pain for those who want to dockerize their Spring-Boot applications (or any Java application) and then test and tune them while running inside a Docker container.

Former setup

  • A commit was pushed to Jenkins, who compiled and unit-tested it.
  • After some code analysis, a Docker image was created and pushed to a repository
    With set-up, testing was highly inefficient.
  • To test it in a server, I had to pull it and execute it and, while I was focused in the performance and memory consumption of my application, tuning the JVM parameters was essential... Unfortunately, they were given at compilation time (a Maven plugin) and could not be changed (or I did not know how to).
The drawbacks I see of this approach:
  • It was slow to test new changes, as the whole CI flow needed to be executed.
  • Unnecessary docker-related commits to the branch generating the Docker images.
The solution I found? Building the image myself before testing it.

Saturday, 2 April 2016

Testing & error handling with Spring + Reactor


As we saw in the previous post, the whole idea of reactive programming floats around two concepts: Data comes in asynchronous streams and those streams are immutable, so any modification results in new streams being created.
These facts makes the code not very intuitive and it is even worse when it comes to testing (even more if we are interacting with external resources, like a Web Service).
To help in the task, Spring and Reactor come with some handy tools that can be applied to simplify the development and testing.

First, let´s take a look at the scenario being tested:

Sequence diagram of the "production" code