Head in the Clouds

Discussion on the state of cloud computing and open source software that helps build, manage, and deliver everything-as-a-service.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that has been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Login

CloudStack University

Posted by on in Cloud Strategy

At Apache CloudStack we recently started an initiative to organize our content into learning modules. We call this initiative CloudStack University. Everyone is invited to participate by contributing content (slides and screencasts), suggesting new learning modules that are needed and even creating exercises and assignments. School fun ! As we were discussing the initiative on the mailing list we started by looking at our existing content: slideshares, youtube videos and thought about organizing them into a CloudStack 101 course. This is still a work in progress that requires everyones participation to make it a great resource.

In the meantime I have been putting all my CloudStack content on slideshare and I wanted to provide a narrated version of these slides together with hands-on demo to show folks how to do a few things with CloudStack specifically but also related Cloud and OSS tips and tricks. Here comes the CloudStack university screencasts. I will add more of them as I go along and receive requests from the community (reach out on twitter @sebgoa and tell me what you want to see). I wanted to give you a preview of what this looks like. To create a self-paced learning module, I decided to create slide decks that people can download from slideshare and cross-post the corresponding screencasts (for most of them at least) on youtube. People can choose a particular topic, or take the entire series. The idea is that at the end of watching all the screencasts and reading the material people graduate from CloudStack University.

Certainly one can imagine how this could evolve into a full fledge training and certification program. I do plan to create a final exam once I am done with a consistent set of modules :) In this post I wanted to introduce you to some of the first modules I created. I welcome all feedback and suggestions to improve them. Reach out to me on twitter (@sebgoa) or contribute your own modules via the wiki and the mailing lists.

...
Hits: 615
Rate this blog entry:
Continue reading Comments

Since my last post on the analysis of the CloudStack community, we have graduated and became a top-level project. It's about time to give an update on what can be seen as a metric of the health of our community.

All the data presented is based on the analysis of the mailing lists, the data is publicly accessible, I have used it previously, just when we graduated March 22nd, in January and back in November when I did some social network analysis. This study was inspired by John Jiang, now working at Eucalyptus, you can read his analysis, note that he moved it to the Eucalyptus website.

Methodology: As explained in previous posts, a Contributor is considered as someone who sent an email to one of the CloudStack mailing lists. This is not to be confused with a Committer which at the ASF is meant to represent someone with write access to the code. Not all code contributors have write access. I identify Companies as the email domain used by the Contributors. This is because Contributors are none-affiliated in the ASF. Obviously it has some limitations as email domains such as gmail.com can represent different companies. All emails are loaded in a mongodb database and queries are performed to extract the plots that you will see below. We currently have seven mailing lists of varying traffic: announce, users, users-cn, dev, marketing, commits, issues. Note that all JIRA emails are now sent to the issues list. Subscription to these lists and number of messages last month is as follows:

* dev@ 609 subs / ~2600 msgs in Apr
* users@ 782 subs / ~800 msgs in Apr
* issues@ 109 subs / ~2400 msgs in Apr
* commits@ 166 subs / ~3300 msgs in Apr
* marketing@ 85 subs / ~260 msg in Apr
* users-cn@ ~300 subs / ~260 msgs in Apr

Contributors: The plots below show the number of contributors per month since we became an ASF project as well as an accumulation to date. Comparison with traffic prior to joining ASF can be seen in the previous posts. The number of monthly contributors in dev is reaching 225 , while the number of monthly contributors in users is reaching 175. Most notable is that the number of contributors in the users list seem to be closing on the number of contributors in dev. It may indicate a stabilization of the number of developers and an increase in the user base. The accumulation on both lists is now over 500. A comparison of both contributor sets gives us an estimate of 806 for the entire CloudStack community. Of course this does not include people who may only participate in the marketing or announce list, but they are much lower traffic lists. It also does not include participants in the Chinese user lists. This will be in the next post hopefully. From the subscription data listed above you can also see that we have roughly a 30% activity ratio, meaning that 1/3 of the subscribers actually send emails to the lists. Difficult to know if this is a good or bad number, one would need to compare with other ASF projects.

...
Hits: 1156
Rate this blog entry:
Continue reading Comments

Security in the Cloud and the CCSK

Posted by on in Cloud Strategy

Search for cloud computing and you will get approximately 190 million results, search for cloud computing security and you will get 120 million results. This is very rough data of course but it gives us an idea that when talking about Cloud, security is a big concern. Go to a conference and talk about Cloud, and you can be certain that one of the big questions you will get asked is "But what about Security ?"

Disclaimer and bias: This question always leaves me pondering, mostly because my personal background and bias always makes me wonder what people are afraid off in the Cloud and what do they see that Cloud brings to bear that is different from any existing distributed systems running over the internet. I am not an enterprise security expert, I used to teach an introductory course on network security, but I have spent my fair share thinking about Clouds especially at the IaaS layer. There, the new technology that could represent a new attack vector is virtualization and I only read about two non-traditional efforts that really challenged the security of virtualization: the controversial bluepill project in 2006 and the cross-VM side channel attack reported by a research group at MIT in 2009 (there are of course more...). Most problems publicly described with IaaS have been with spam and DDOS. Where on one hand cloud providers are being used to send spam and on the other hand cloud providers are victim of DDOS threatening the availability of services.

However, in the fall I had the chance to participate in the DELL in the Clouds Think Tank in London. It is there that I started to understand that what most people where worried about with the Cloud had more to do with legal issues, governance, compliance and contracts than hardcore attacks. Indeed when dealing with a cloud provider you are exposing your data to new risks for the simple fact that it is not under your total control and you need to manage those risks. Moving your data out of your secured premises and putting them in the hands of another party exposes you to new threats. This is the core of information assurance and risk management. Cloud security is therefore more about updating your security guidelines, making sure that you are compliant with the law and being confident that you can respond appropriately to any attack or business continuity issues. Cloud security is less about the fear of a new technology that exposes new attack vectors. The risks may be new to your enterprise but the attacks and vulnerabilities used are not new to the internet.

To learn more and come up with a plan I now point people to the Cloud Security Alliance (CSA) and their guidelines. It is a 176 pages document which coupled with the ENISA cloud security assessment (125 pages :)) forms the basis of the CSA Certificate of Cloud Security Knowledge (CCSK). I have finished reading the CSA guidelines and once I read the ENISA report I will take the CCSK exam.

The CSA guidelines are a set of reports covering fourteen domains of interest to Cloud security. From Governance and Legal Issues to Incident Response and Virtualization (to name a few). One sentence truly resonated with me due to my personal bias explained earlier. It is in the Application Security domain chapter which states: "Cloud-based software applications require a design rigor similar to an application connecting to the raw internet - the security must be provided by the application without any assumptions being made about the external environment" indeed doing the opposite would be one of the fallacies of distributed systems design enunciated by Peter Deutsch from SUN. There lies in my view the biggest risk, thinking that you can take an application that has been designed in-house assuming a secure local network and wanting to move it to the cloud as-is not managing the risks due to the fact that a) the network is not secure b) bandwidth is not infinite c) latency is not zero d) transport has a cost.

...
Hits: 1760
Rate this blog entry:
0
Continue reading Comments

SDN in CloudStack

Posted by on in Cloud Strategy

Software Defined Networking (SDN) has seen a lot of uptick in momentum since VMware acquired Nicira last summer, three months after Google announced that they were using OpenFlow to optimize their internal backbone. In this post we look at "SDN" support in CloudStack.

First, let's try to define SDN in a short paragraph. It is, was it spells out to be (like the french adage: "c'est comme le Port-Salut c'est écrit dessus" :)) a way to configure the network using software. This means that the network definition (routing, switching), optimization (load-balancing, firewall, etc) becomes a software problem with SDN. If you have seen the wikipedia page and read other articles, you most likely have read that SDN decouples the control plan and the data plane. This is short for saying that the forwarding tables used in switches/routers will be controlled by a software/applications that can be remote. Part of the SDN landscape is OpenFlow. OpenFlow is a standard defined by the ONF that allows you to implement a SDN solution. It defines the protocol used by the control plane to send forwarding information (a.k.a flow rules) to network devices. However while some early SDN companies embraced and led the development of openflow (e.g BigSwitch), an SDN solution may not use the OpenFlow protocol. This leads to a key point of todays SDN solutions: SDN != OpenFlow.

The academic in me can't help but point out the GENI initiative, which aims to re-design the internet and start from a clean slate. SDN research happens on GENI but it is also seen as a way to instrument the network, isolate experiments and dynamically reconfigure the network. Something impossible (especially over wide are networks) before SDN. With FutureGrid and Grid5000 being testbeds for IaaS solutions, GENI is a real-life testbed for future networking solutions. I had a chance to work a little bit on GENI while at Clemson. We modified the NOX OpenFlow controller to integrate it with OpenNebula and provide Security Groups as well as Elastic IP functionality.

While virtualization was a key enabler of IaaS, virtual switches are key enablers of SDN based networks. Open Virtual Switch (OVS) is the leading virtual switch. OVS is now used in most private and public clouds, it replaces the standard linux bridge and is used to connect virtual machine network interfaces to the physical network. An OVS can be connected with an OpenFlow controller and receive flow rules from the controller. However it does not need to, an SDN solution could talk directly to OVS. The main issue being that a single openflow controller may not be fast enough to process all "control" decisions on a large networks. We would have to see a distributed OpenFlow controller to be able to reach extremely large scale (It might exist, I just have not found references for it). OVS can do many things among them: VLAN tagging, QoS, Generic Routing Encapsulation(GRE) and Stateless Transport Tunneling (STT) tunnels. To know more about the difference between GRE and STT see this blog by Bruce Davie and Andrew Lambeth.

So where does SDN help IaaS ? Anyone who has worked on networking of virtual machines (VM) knows how complex this can get. VMs from multiple tenants need to be isolated from each other, they have private IP addresses but may need to be accessed form the public internet, VMs can be migrated within a single broadcast domain but one may want to migrate across domains. VMs from multiple data centers may need to be in the same subnet and broadcast domain (Layer2) etc. Networking is really complex issue in IaaS. Even more so that it is hard to understand conceptually. One really needs to think in terms of logical networks and forget about the physical network (at least at a high level). To enable all these things and reach large scale we need to be able to control the network devices from the application layer. This is where a significant shift is happening. The application developers are now going to describe the network they need and provision it on-demand. Yet again SDN is Cloud. On-demand and Elasticity in the network thanks to SDN.

...
Hits: 6091
Rate this blog entry:
Continue reading Comments

Portability: No snowflakes for you!

Posted by on in Cloud Strategy

An author, whose opinion I respect, wrote recently about how cloud portability 'is a bit of science fiction' at this point. While he is technically right - the idea of moving running machines from a CloudStack cloud to a vCD cloud, CloudStack, or AWS is still at best 'a bit of black magic'[1], and in other cases simply isn't feasible at all. And doing it at scale??? that's just crazy talk. That said, I'd argue (yes, this is largely just my opinion) that in most cases if you are trying for portability at the Infrastructure-as-a-Service layer, you are doing it wrong.

You see, this 'problem' of portability isn't a new one, and really isn't even cloud related, I think it's far more fundamental than that. I remember, in my days as a Pimply-Faced-Youth, of keeping exact copies of machines as cold spares - the thought being that we might need to migrate the services living on one piece of hardware, and particularly for a certain non-Unix operating system, restoring a copy of a disk, or even the disks themselves to different hardware often meant things just wouldn't work. (sound familiar?) Sometimes you got lucky and it would, and sometimes you could coerce things into a working state by installing new drivers, or updating initrd, or even adding kernel modules. So when buying hardware, we'd buy two of everything (or at least N+1) for those services that were truly business critical. Yes it was incredibly expensive, and wasteful, but it mitigated risk.

A bit later in my ops lifetime, virtualization became mainstream. Finally - we had this psuedo-hardware that would look the same regardless of the underlying physical hardware - hypervisors like VMware and Xen did wonderful things for us. Admitedly we couldn't easily migrate between hypervisor types, but life was still much better off. Not only were we able to stop buying so many 'spares', but we were a bit less concerned about machine lifecycle and had much better utilization to boot.

Folks came a bit further along with tools to convert these virt machines from VMware to KVM or Xen. As a matter of fact, on a previous blog I used to maintain, one of my most-viewed articles was one that detailed the use of qemu-img and other tools to convert VMDK files to raw disk images. I did more of this than I care to admit - and honestly when I did it, I was striving for portability, even if in a kludgy, barely automted way. Portability was the wrong thing to seek after - and I should have already known it at that point in my career.

Portability is the equivalent of trying to preserve a snowflake. It's difficult, usually messy, and often ineffective. What I didn't realize is that I really shouldn't have any snowflakes - that what I was really after was a way to consistently reproduce, not a system to save the only copy the world would ever see. Of course configuration management already existed, why I didn't understand the problem and the solution that was already out there I don't know. Perhaps I was a latent server-hugger, or feared obsolence by a few lines of ruby - but configuration management (coupled with automated provisioning) meant I didn't need portability - didn't even want portability.

...
Hits: 5513
Rate this blog entry:
Continue reading Comments
About BuildaCloud.org Resources Site Info

Build a Cloud.org is a resource for those users who want to build cloud computing software with both open source and proprietary software.