Welcome guest - login - register

Nanyang Campus Grid

Grid Application Deployment Kit

1. Introduction

GAD kit, a friendly Graphical User Interface (GUI), is developed to assist users in developing their GridRPC functions, monitoring Grid resources, as well as deploying their applications. In addition, it is based on a GridRPC API implementation that hides the complexity of Grid environment from users. Moreover, mechanisms called composite service and metascheduling mechanism to reduce communication cost and achieve high performance for the applications, respectively are explored.

The rest of this report is organized as follows. Section 2 briefly describes the overall design of GAD Kit and its functions. In section 3, we present the mechanisms to provide composite services and the metascheduler that help applications to achieve high performance. Experimental results on aerodynamic wing design problem using the GridRPC API developed are discussed in section 4. Finally, section 5 draws conclusions and outlines future work.

2. Grid Application Deployment Kit

2.1. Functions

The GAD Kit targets three functions:

  1. Providing services: this function focuses on deploying computational services on Grid resources. It provides mechanisms to describe computational services, distribute them across Grid resources, and make these services executable.
  2. Consuming services: by providing GridRPC API, the GAD Kit enables users to focus on their applications instead of dealing with complexity of underlying system. Moreover, the GAD Kit also provides some application templates, e.g. Genetic Application Template, to users so that they can easily "gridify" their code.
  3. Monitoring Grid resources: monitoring resources help users to select the most suitable resources for their applications. In addition, based on monitor's information, an application selects resources based on their dynamic workloads so that it can achieve high performance.

figure1

Figure 1. Grid Application Deployment (GAD) Kit

2.2. Architecture

In order to achieve the above goals, the GAD Kit has an architecture as follows:

  1. Backend system: Presently we support NetSolve as a backend execution system. However, since NetSolve does not have a precise scheduler, we replace its scheduler by a new one that has the ability to balance workload across NetSolve servers.
  2. Programming interface: GridRPC API, a standard and high-level interface, is provided so that users' applications can access the Grid resources. TheGrid Application Deployment Kit 3 GridRPC API includes the metascheduler which is responsible for scheduling clients' requests to Grid resources with regards to their capabilities in software, hardware and workloads.
  3. User interface: The system comes with a user-friendly GUI. From this interface, users can deploy their code at the backend, and then execute their program which has been "gridified" using the GridRPC API. The GUI also assists in monitoring the execution of applications and resource status. The architecture of the GAD Kit are presented in Figure 2.

figure2

Figure 2. Grid Application Deployment (GAD) Kit

2.3. Technologies

In order to fulfil the requirements, we use following tools and technologies:

  1. NetSolve: NetSolve is employed as a local resource manager for each single cluster on Grid environment. It has responsibility for managing local computational services, scheduling computation tasks across computing nodes, and perform load balancing across these computing nodes.
  2. Globus: Globus provides the infrastructure for Grid applications. Hence, its services, i.e. Monitoring and Discovery Service (MDS), Grid Resource Allocation Manager (GRAM), Grid Resource Information Service (GRIS), Grid Security Infrastructure (GSI), and Global Access to Secondary Storage (GASS) were employed in GAD Kit and GridRPC API implementation.
  3. GridFTP: GridFTP is used as the data communication protocol for applications. It also helps GAD Kit to distribute source codes across Grid resources.
  4. LDAP: OpenLDAP client API, a part of LDAP implementation, is employed as a protocol to retrieve resource information from MDS.
  5. Ganglia: Ganglia is used to monitor workload of Grid resources. It register the information into Globus MDS. Based on the workload information, the metascheduler can select suitable resources for the applications.
  6. Tcl/Tk: ActiveTcl is used to develop GAD Kit's GUI. It is portable and able to integrate with another powerful visualization libraries that may be useful for further requirements.

3. Underlying System and Performance Improvement

3.1. Underlying System

Figure 3 presents the execution mechanism for the GridRPC systems. It is assumed that applications are developed using our extended GridRPC API. At the front-end, the GridRPC client activates several remote procedure calls. The metascheduler embedded in GridRPC API implementation then farms the requests to the available resources based on workload and resource information recorded in the Globus MDS by Ganglia. At the back-end, once the GRAM's gatekeeper of a cluster receives a composite service request, an instance of the composite service will be initialized on the master node of the cluster. Subsequently, the set of basic service requests will be farmed across multiple computing nodes in the cluster. The responsibility of the NetSolve agent is to perform local scheduling and resource discovery across computing nodes in the cluster for basic services.

figure3

Figure 3. GridRPC execution system

In order to improve performance of applications, we introduce two mechanisms as follows.

3.2. Composite Services

In order to reduce communication cost across clusters, we extend GridRPC API for composite services. An additional parameter that indicates the number of basic services will be carried out inside a composite service is introduced in GridRPC API. When the number is equal to one, the composite service becomes a basic service. Using composite service, a set of basic GridRPC requests can be bundled into one composite GridRPC request such that the amount of communications between clients and resources can reduced.

3.3. Metascheduling

Resource selection and scheduling plays a critical factor on the performance of Grid applications. The metascheduler not only performs match-making between the request's requirements and available resources but also predicts workload change of the resources once a request is submitted. Prediction is required since retrieving resource status from the resource monitor is time consuming and the information may not be up-to-date due to network latency. Especially, when asynchronous requests are produced in a loop, resource status cannot be updated as fast as the requests generated. We use a heuristic algorithm with O(nmlogm) complexity to schedule GridRPC requests. The metascheduler schedules application requests using raw performance of the Grid resources provided by Globus MDS, complexity of the GridRPC will be executed, and current workload of the Grid resources provided by Ganglia. The details of the metascheduling algorithm can be found in [1].

4. Experimental Results

We used GAD Kit to "gridify" the airfoil design and analysis problem [2, 1]. Using GAD Kit, users distributed airfoil analysis code across Grid resources. Then, with the aids of service consuming function, the application was built using these services and the GridRPC API.

 HPC ClusterHPC2 ClusterSun Cluster
Raw performance480.943 MFLOPS90.147 MFLOPS85.833 MFLOPS
Initial workload3.1880.0200.021
Computing nodes101010

Table 1. Properties of three clusters

In this experiment, the design of airfoil for minimum drag was deployed on 3 clusters within the Nanyang Technological University Campus Grid. Their con- figuration and initial load are presented in Table 1. Here, 200 potential airfoil designs are analyzed in parallel. Each analysis task is approximately 14,062,632,000 floating point operations. The 200 analysis tasks are submitted in a loop of 20 asynchronous calls that each of which invokes a composite service analyzing 10 designs on a cluster. Upon receiving the analysis jobs submitted by the GridRPC client, the metascheduler distributed 13, 4, and 3 requests to HPC Cluster, HPC2 Cluster and Sun Cluster, respectively based on capabilities and workloads of these clusters.

Figure 4 presents workload of three clusters and one of three clusters during running time of the application. In the HPC2 cluster, 40 designs were analyzed from 18:40 to 19:00. Workload of the master node (ec-pdccm machine) differed from that of the computing nodes because it only performed scheduling computing tasks of composite services. The results show that balancing workload inside cluster can be achieved easier than balancing workload across heterogeneous clusters.

figure4

Figure 4. Workload of the System

5. Conclusions and Future Work

"Gridifying" an application over the Grid is still hindered by a lack of suitable middleware. In this report, we presented GAD Kit, a user friendly interface, and the design and implementation of the GridRPC API to attain a seamless and transparent access to Grid resources. Using GAD Kit, users can easily deploy their application on Grid environment. The airfoil design and analysis application used in the experiment showed that composite services and metascheduling mechanism can assist users to access Grid resources effectively.

References

  1. Q. T. Ho, Y. S. Ong, and W. Cai. "Gridifying" Aerodynamic Design Problem Using GridRPC. In The 2nd International Workshop on Grid and Cooperative Computing (GCC2003), December 2003.
  2. Y. S. Ong, P. B. Nair, and A. J. Keane. Evolutionary Optimization of Computationally Expensive Problems via Surrogate Modeling. In American Institute of Aeronautics and Astronautics Journal, 40(4), pp. 687-696, 2003.