Wednesday, August 22, 2012

WSO2 API Manager: Designed for Scalability

Scalability is a tough nut to crack. When developing enterprise software and deploying them in mission critical environments, you need to think about the scalability aspects from day one. If you don’t, you may rest assured that a whole bunch of unpleasant surprises are heading your way. Some of the problems you may encounter are systems crashing inexplicably under heavy load, customers constantly rambling about the poor performance of the system and system administrators having to play watch dog to the deployed applications day in and day out. In addition to these possible mishaps, experience tells us that attempting to make a live production system scalable is hell of a lot more difficult and expensive. So it’s always wise to think about scalability before your solutions go live.
The crew at WSO2 have a firm grip on this reality. Therefore when designing and developing the WSO2 API Manager, we made scalability of the end product a top priority. We thought about how the overall solution is going to scale and how its individual components are going to scale. In general we thought about how the API Manager can scale under following circumstances.
  • Growing number of API subscribers (growth of the user base)
  • Growing number of APIs (growth of metadata and configurations)
  • Growing number of API calls (growth of traffic)
Now let’s take a look at the architecture of WSO2 API Manager and how it can scale against the factors listed above. Following schematic provides a high level view of the major components of the product and their interactions.
When you download the WSO2 API Manager binary distribution, you get all the above components packaged as a single artifact. You can also run the entire thing in a single JVM. We call this the standalone or out-of-the-box setup. If you only have a few hundred users and a handful of APIs, then the standalone setup is probably sufficient to you. But if you have thousands and thousands of users and hundreds of APIs then you should start thinking about deploying the API Manager components in a distributed and scalable manner. Let’s go through each of the components in the above diagram and try to understand how we can make them scalable.
Databases
WSO2 API Manager uses 2 main databases - the registry database and the API management database. The registry database is used by the underlying registry components and governance components to store system and API related metadata. API management database is primarily used to store API subscriptions. In the standalone setup, these 2 databases are created in the embedded H2 server.
In a scalable setup, it will be necessary to create these databases elsewhere, ideally in a clustered and high available database engine. One may use a MySQL cluster, SQL Server cluster or an Oracle cluster for this purpose. As you may see in the next few sections of this post, in a scalable deployment we might cluster some of the internal components of the WSO2 API Manager. Therefore there will be more than one JVM involved. All these JVMs can share the same databases created in the same clustered database engine.
Settings for the registry database are configured in a file named registry.xml which resides in the repository/conf directory of the API Manager. API management database settings are configured in a file named api-manager.xml which also resides in the same directory. Additionally there’s also a master-datasources.xml file where all the different data sources can be defined and you have the option of reusing these data sources in registry.xml and api-manager.xml.
API Publisher and API Store
These 2 components are implemented as 2 web applications using Jaggery.js. However they require some of the underlying Carbon components to function – most notably the API management components, governance components and registry components. If your deployment has a large user base, then chances are both API Publisher and API Store will receive a large volume of web traffic. Therefore it’s advisable to scale these two web applications up.
One of the simplest ways to scale them up is by clustering the WSO2 API Manager. You can run multiple instances of the API Manager pointed at the same database. An external load balancer (a hardware load balancer, WSO2 Load Balancer or any HTTP load balancer) can distribute the incoming web traffic among the different API Manager nodes. Tomcat session replication can be enabled among the API Manager nodes so that the HTTP sessions established by the users are replicated across the entire cluster.
The default distribution of WSO2 API Manager has both API Publisher and API Store loaded into the same container. Therefore an out-of-the-box API Manager node plays a dual role. But you have the option of removing one of these components and making a node play a single role. That is a single node can act either as an API Publisher instance or as an API Store instance. Using this capability you can add a bit of traffic shaping into your clustered API Manager deployment. In a typical scenario there will be only a handful of people (less than 50) who create APIs but a large number of subscribers (thousands) who consume the published APIs. Therefore you can have a large cluster with many API Store nodes and a small cluster of API Publisher nodes (or even a single API Publisher node would do). Two clusters can be setup separately with their own load balancers.
Key Management
Key management component is responsible for generating and keeping track of API keys. It’s also in charge of validating API keys when APIs are invoked by subscribers. All the core functions of this component are exposed as web services.  The other components such as the API Store and API Gateway communicate with the key manager via web service calls. Therefore if your system has many consumers and if it receives a large number of API calls, then it’s definitely advisable to scale this component up.
Again the easiest way to scale this component is by clustering the API Manager deployment. That way we will get multiple key management service endpoints which can be put behind a load balancer. It’s also not a bad idea to have a separate dedicated cluster of Carbon servers that run as key management servers. An API Manager node can be stripped of its API Publisher, API Store and other unnecessary components to turn it into a dedicated key management server. 
User Management
This is the component against which all user authentication and permission checks are carried out. API Publisher and API Store frequently communicate with this component over a web service interface. In the standalone setup, a database in the embedded H2 server is used to store user profiles and roles. But in a real world deployment, this can be hooked up with a corporate LDAP or an Active Directory instance. To scale this component, we can again make use of simple clustering techniques. All the endpoints of the exposed user management services can be put behind a load balancer and exposed to the API Publisher and API Store.
API Gateway
This is the powerhouse where all the validating, throttling and routing of API calls take place. It mainly consists of WSO2 ESB components and hence can be easily clustered, just as how you would setup an ESB cluster. One of the gateway nodes will function as the primary node through which all API configuration changes are applied. API Publisher will communicate with the primary node via web service calls to deploy, update and undeploy APIs. Carbon’s deployment synchronizer can take care of propagating all the configuration changes from the primary node to rest of the nodes in the gateway cluster.
API Gateway also caches a lot of information related to API key validation in order to prevent having to query the key manager frequently. This information is stored in the built-in distributed cache of Carbon (based on Infinispan). Therefore in a clustered setup, information cached by a single gateway node becomes visible to other gateway nodes in the cluster. This further helps to reduce the load on the key manager and improves the response time of API invocations.
Usage Tracking
We use WSO2 BAM components to publish, analyze and display API statistics. BAM has its own scalability model. Thrift is used to publish statistics from API Gateway to a remote Cassandra cluster. Use of Thrift ensures that statistics can be published from API Gateway to the Cassandra store at a rapid rate. The BAM data publisher also employs its own queuing mechanism and thread pool so that data can be published asynchronously without having any impact on the messages routed through the API Gateway. Use of Cassandra enables fast read-write operations on enormous data sets. 
Once the data has been written to the Cassandra cluster, Hadoop and Hive are used to process the collected information. Analyzed data are then stored in a separate database from which API Manager (or any other monitoring application) can pull out the numbers and display in various forms of tables and charts.
Putting It All Together
As you can see WSO2 API Manager provides many options to scale up its individual components. However it doesn’t mean you should scale up each and every piece of it for the overall solution to be scalable. You should decide which components to scale up by looking at your requirements and the expected usage patterns of the solution. For instance, if you only have a handful of subscribers you don’t have to worry about scaling up API Store and API Publisher, regardless of how much traffic they are going to send. If you have thousands of subscribers, but only a handful of them are actually sending any traffic, then the scalability of API Store will be more important than scaling up the Gateway and statistics collection components.

Friday, August 17, 2012

Introducing WSO2 Carbon 4.0

Samisa Abeysinghe speaking about the latest release of WSO2 Carbon.

Monday, August 6, 2012

WSO2 API Manager 1.0.0 Goes GA

Last Friday we released WSO2 API Manager 1.0. It is the result of months of hard work. We started brainstorming about a WSO2 branded API management solution back in mid 2011. Few months later, in October, I implemented API support for Apache Synapse, which was a huge step in improving the REST support in our integration platform (specially in WSO2 ESB). This addition also brought us several steps closer to implementing a fully fledged API management solution based on WSO2 Carbon and related components. Then somewhere around February 2012, a team of WSO2 engineers officially started working on the WSO2 API Manager product. Idea was simple - combine our existing components to offer a smooth and end-to-end API management experience while addressing a number of challenges such as API provisioning, API governance, API security and API monitoring. The idea of combining our mediation, governance, identity and activity monitoring components to build the ultimate API management solution was a fascinating one to think about even for us.
I officially joined the WSO2 API Manager team in late April. It's been 15 hectic weeks since then but at the same time it's been 15 enjoyable weeks. Nothing is more fulfilling than seeing a project evolving from a set of isolated components into a feature complete product with its own UIs, samples and documentation. The development team was also one of the best a guy could ask for with each member delivering his/her part to the fullest, consistently going beyond expectations. 
This release of WSO2 API Manager supports creating APIs, versioning them and then publishing them into an 'API Store' after a review process. API documentation, technical metadata and ownership information can also be collected and tracked through the solution. The built-in API Store allows API consumers to browse the published APIs, provide feedback on them, and ultimately obtain API keys required to access them. API security is based on OAuth bearer token profile and OAuth resource owner grant types are supported to allow end-user authentication for the APIs. The API gateway (runtime) publishes events and statistics to a remote BAM server which then runs a series of analyzers  to extract useful usage information and display them on a dashboard. 
We are currently working with a group of customers and independent analysts to evaluate the product and further improve it. Objective is to go into 'release early - release often' mode and do a series of patch releases, thereby driving the product into maturity quickly. You can also join the effort by downloading the product, trying out a few scenarios and giving us some feedback on our mailing lists. You can report any issues or feature requests on our JIRA. Please refer the on-line documentation if you need any clarifications on any features. Have fun!