Apache Spark Big Data Bigdata Hadoop Java

SIX sparkling features of Apache Spark!

What is Apache Spark? Why there is a serious buzz going-on about this? If you are into BigData analytics business then, should you really care about Spark? Hope this post will help to answer some of these questions which might have coming to your mind these days.

Apache Spark is a powerful open source processing engine for Hadoop data built around speed, easy to use, and sophisticated analytics. It was originally developed in UC Berkeley’s AMPLab and later-on it moved to Apache. Apache Spark is basically a parallel data processing framework that can work with Apache Hadoop to make it extremely easy to develop fast, Big Data applications combining batch, streaming, and interactive analytics on all your data.

Lets go through some of its features which are really highlighting it in the Bigdata world!

  1. Lighting Fast Processing

When comes to BigData processing speed always matters. We always look for processing our huge data as fast as possible. Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk. Spark makes it possible by reducing number of read/write to disc. It stores this intermediate processing data in-memory. It uses the concept of an Resilient Distributed Dataset (RDD), which allows it to transparently store data on memory and persist it to disc only it’s needed. This helps to reduce most of the disc read and write –  the main time consuming factors – of data processing.

(Spark Performance over Hadoop. Image Courtesy: Cloudera. Visit this link to see how Jai & Matei explains the delightful experience giving by Spark to its developers.)

  1. Ease of Use as it supports multiple languages

Spark lets you quickly write applications in JavaScala, or Python. This helps developers to create and run their applications on their familiar programming languages. It comes with a built-in set of over 80 high-level operators.We can use it interactively to query data within the shell too.

  1. Support for Sophisticated Analytics

In addition to simple “map” and “reduce” operations, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box. Not only that, users can combine all these capabilities seamlessly in a single workflow.

  1. Real time stream processing

Spark can handle real time streaming. Map-reduce mainly handles and process the data stored already. However Spark can also manipulate data in real time using Spark Streaming. Not ignoring that there are other frameworks with their integration we can handle streaming in Hadoop.

Here is what Cloudera says about Sparks Streaming abilities:

  • Easy: Built on Spark’s lightweight yet powerful APIs, Spark Streaming lets you rapidly develop streaming applications
  • Fault tolerant: Unlike other streaming solutions (e.g. Storm), Spark Streaming recovers lost work and delivers exactly-once semantics out of the box with no extra code or configuration
  • Integrated: Reuse the same code for batch and stream processing, even joining streaming data to historical data

(Streaming Performance over Storm. Image

  1. Ability to integrate with Hadoop and existing HadoopData

Spark can run independently. Apart from that it can run on Hadoop 2’s YARN cluster manager, and can read any existing Hadoop data. That’s a BIG advantage! It can read from any Hadoop data sources for example HBase, HDFS etc. This feature of Spark makes it suitable for migration of existing pure Hadoop applications, if that application use-case is really suiting Spark. As Spark is using immutability more all scenarios might not be suitable for migration.

  1. Active and expanding Community

Apache Spark is built by a wide set of developers from over 50 companies. The project started in 2009 and as of now more than 250 developers have contributed to Spark already! It has active mailing lists and JIRA for issue tracking.

Below are some useful links to start with:

If you want to learn basics of Apache Spark then my previous post will help you. It has a training video link which explains Spark simple way.

easy mock Java mock objects mock testing

Creating Mock Tests: Using Easy mock

Unit testing is now a "best practice" for software development. In this unit testing we have to face so many situations where we need to interact with Database or any other resources. But at the same time we need to make our Tests isolated too. Here comes the importance of Mock objects.

Mock objects are a useful way to write unit tests for objects that act as mediators. “Instead of calling the real domain objects, the tested object calls a mock domain object that merely asserts that the correct methods were called, with the expected parameters, in the correct order.”

Using EasyMock Framework

EasyMock is a framework for creating mock objects using the java.lang.reflect.Proxy object. When a mock object is created, a proxy object takes the place of the real object. The proxy object gets its definition from the interface or class you pass when creating the mock.
EasyMock is providing two APIs for creating mock objects that are based on interfaces, the other on classes (org.easymock.EasyMock and org.easymock. classextensions.EasyMock respectively).

We can separate the EasyMock implementation into FOUR steps 

1. Creating a Mock Object using “EasyMock.createMock”.

Create Mock : Using this static method we are creating a mock object. And this is the first step which we need to do in mock testing.

When we creates this mock objects then we can follow three levels.

Regular: If we are expecting some methods to be executed then it was not executed then the test will fail.  And if any unexpected tests executed then also test will fail. Here the order of the method execution is not important.

Ex: EmpDAO empDAO = EasyMock.createMock(EmpDAO.class);

Nice: If we are expecting some methods to be executed then it was not executed then the test will fail. And if any unexpected tests executed then it will return a default value. Here also order is not important.

Ex: EmpDAO empDAO = EasyMock.createNiceMock(EmpDAO.class);

Strict: Same as regular but here the Order of the expected methods also important.

Ex: EmpDAO empDAO = EasyMock.createStrictMock(EmpDAO.class);

2. Expecting mock object method calls using “EasyMock.expect”.

This is used to expect some method calls from our mock object. Lets go through one example.

Lets assume we have the following methods which gets employee information from DB.

List<Employee> employee = empDao.getEmpDetails();

List<Employee> employee = empDao.getEmpDetailsByName(“bond”);

In the unit test we need to follow as follows…

EmpDao mockDao = EaskMock.createMock(EmpDao.class);

Employee mockEmp = new Employee();

List<Employee> empList= new ArrayList<Employee>(1);


3. Registering/replaying expected methods using “EasyMock.replay”.

Once the behavior of the mock objects has been recorded with expectations, the mock objects must be prepared to replay those expectations. We are using replay() method for this purpose. EaskMock will stop the expecting behavior once this method calls.


4. Verifying the expected methods using “EasyMock.verify”.

Veirfying the mock expectations is the final step which we need to follow. This includes validating that all methods that were expected to be called were called and that any calls that were not expected are also noted.


Easymock is providing more functionalities like “Matchers” etc for more unit testing flexibility. EasyMock has been the first dynamic Mock Object generator, relieving users of hand-writing Mock Objects, or generating code for them. It helps us to increase our testing coverage a lot.


Which framework you will choose as the best one?

Java javafx jruby ruby Sun

Sun Tech Days : In Hyderabad


Yesterday I had attended Sun Developers Conference held here in Hitex Convensional center, Hyderabad, India. It was day ONE of three days conference.  The day ONE was really interesting and informative for me. Got an overview about the new Sun techs and got chance to interact with a lot of developers working in Java.

We reached there aroung 9.30 in the morning and done with our registration formalities. The first seesion as Sun keynote bye Rich Green, Executive Vice President, Software, Sun microsystems. After that there was a Demo showcase in which SIX SUN java professionals presented some software demos. Those were in jMaki, Sun SPOTS, J2ME, Swing, JavaFX etc. There are 30 sessions total in the First day and those are from 5 different categories. Five sessions are going on the same time and the whole day is divided in to Six layers. So a delegate can select a session as per his/her taste. If a person is attentding full sessions then they can attent maximum 6 sessions in a day.

The Sessions which I had attened are

1. JEE , Glassfish and their future

2. Testing with Junit and other Testing tools

3. Rapid development with Ruby, JRuby and rails

4. Java Persistence API : Further simplifying persistence.

5. Java troubleshooting tips.

6. JEE with Spring ad Seam.

You can check the other sessions here

The first day ended with a Welcome reception – delicious Dinner and a Music Mela. 🙂 You can read more about each sessions in my next posts…