Category: OpenStack

Vendor and Cloud lock-in; Good? Bad? Indifferent?

Vendor lock-in, also known as proprietary lock-in or customer lock-in, is when a customer becomes dependent on a vendor for products and services. Thus, the customer is unable to use another vendor without substantial switching costs.

The evolving complexity of data center architectures makes migrating from one product to another difficult and painful regardless of the level of “lock-in.” As with applications, the more integrated an infrastructure solution, architecture and business processes, the less likely it is to be replaced.

The expression “If it isn’t broke, don’t fix it” is commonplace in IT.

I have always touted the anti-vendor lock-in motto. Everything should be Open Source and the End User should have the ability to participate, contribute, consume and modify solutions to fit their specific needs. However is this always the right solution?

Some companies are more limited when it comes to resources. Others are incredibly large and complex making the adoption of Open Source (without support) complicated. Perhaps a customer requires a stable and validated platform to satisfy legal or compliance requirements. If the Vendor they select has a roadmap that matches the companies there might be synergy between the two and thus, Vendor lock-in might be avoided. However, what happens when a Company or Vendor suddenly changes their roadmap?

Most organizations cannot move rapidly between architectures and platform investments (CAPEX) which typically only occur every 3-5 years. If the roadmap deviates there could be problems.

For instance, again let’s assume the customer needs a stable and validated platform to satisfy legal, government or compliance requirements. Would Open Source be a good fit for them or are they better using a Closed Source solution? Do they have the necessary staff to support a truly Open Source solution internally without relying on a Vendor? Would it make sense for them to do this when CAPEX vs OPEXi s compared

The recent trend is for Vendors to develop Open Source solutions; using this as a means to market their Company as “Open” which has become a buzzword. Such terms like Distributed, Cloud, Scale Out, and Pets vs Cattle have also become commonplace in the IT industry.

If a Company or individual makes something Open Source but there is no community adoption or involvement is it really an open Source project? In my opinion, just because I post source code to GitHub doesn’t truthfully translate into a community project. There must be adoption and contribution to add features, fixes, and evolve the solution.

In my experience, the Open Source model works for some and not for others. It all depends on what you are building, who the End User is, regulatory compliance requirements and setting expectations in what you are hoping to achieve. Without setting expectations, milestones and goals it is difficult to guarantee success.

Then comes the other major discussion surrounding Public Cloud and how some also considered it to be the next evolution of Vendor lock-in.

For example, if I deploy my infrastructure in Amazon and then choose to move to Google, Microsoft or Rackspace, is the incompatibility between different Public Clouds then considered lock-in? What about Hybrid Cloud? Where does that fit into this mix?

While there have been some standards put in place such as OVF formats the fact is getting locked into a Public Cloud provider can be just as bad or even worse than being locked into an on-premise architecture or Hybrid Cloud architecture, but it all depends on how the implementation is designed. Moving forward as Public Cloud grows in adoption I think we will see more companies distribute their applications across multiple Public Cloud endpoints and will use common software to manage the various environments. Thus, being a “single pane of glass” view into their infrastructure. Solutions like Cloudforms are trying to help solve these current and frustrating limitations.

Recently, I spoke with someone who mentioned their Company selected OpenStack to prevent Vendor lock-in as it’s truly an Open Source solution. While this is somewhat true, the reality is moving from one OpenStack distribution to another is far from simple. While the API level components and architecture are mostly the same across different distributions the underlying infrastructure can be substantially different. Is that not a type of Vendor lock-in? I believe the term could qualify as “Open Source solution lock-in.”

The next time someone mentions lock-in ask them what they truly mean and what they are honestly afraid of. Is it that they want to participate in the evolution of a solution or product or that they are terrified to admit they have been locked-in to a single Vendor for the foreseeable future?

Is it that they want to participate in the evolution of a solution or product or that they are terrified to admit they have been locked-in to a single Vendor for the foreseeable future?

The future is definitely headed towards Open Source solutions and I think companies such as Red Hat and others will guide the way, providing support and validating these Open Source solutions helping to make them effortless to implement, maintain, and scale.

All one needs to do is look at the largest Software Company in the world, Microsoft, and see how they are aggressively adopting Open Source and Linux. This is a far cry from the Microsoft v1.0 which solely invested in their own Operating System and neglected others such as Linux and Unix.

So, what do you think? Is Vendor lock-in, whether software related, hardware related, Private or Public Cloud, truly a bad thing for companies and End Users or is it a case by case basis?

Advertisements

Cloud Wars – Starring Amazon, Microsoft, Google, Rackspace, Red Hat and OpenStack: The fate of the OS!?

Below is my opinion. Feel free to agree or disagree in the comments or shares but please be respectful to others.

There have been some discussions regarding the Cloud Wars and the current state of the Cloud. One thing I recently participated in was a discussion regarding Microsoft, Red Hat, and Linux distribution adoptions.

Since Microsoft announced the release of their software on Linux platforms, adopted Linux distributions and Linux-based software many people are wondering what this brave new world will look like as we move into the future.

First, we should discuss the elephant in the room. Apple has grown considerably in the desktop market while other companies shares have shrunk. We cannot discount the fact that IOS/OSX are in fact Operating Systems. There are also other desktop/server Operating Systems such as Windows, Chrome, Fedora, CentOS, Ubuntu and other Linux distributions. My apologies for not calling out others as there are far too many to mention. Please feel welcome to mention any overlooked in the comments that you feel I should have included.

The recent partnership between Microsoft and Red Hat has been mutually beneficial and we are seeing more companies that historically ignored Linux now forming alliances with distributions as it has been greatly adopted in the Enterprise. The “battlefield” is now more complex than ever.

Vendors must contend with customers moving to the Public Cloud and adopting “Cloud Centric” application design as they move to a Software as a Service model. In the Cloud, some Operating Systems will erode while others will flourish.

Let’s not forget there is a cost for using specific Operating Systems in the Cloud and other options can be less costly. There are ways to offset this by offering users the ability to bring their own licensing or selecting the de facto Operating System of choice for a Public Cloud. These can be viable options for some and deal breakers for others.

Public Clouds like Azure and Google are still young but they will both mature quickly. Many feel Google may mature faster than others and become a formidable opponent to the current Public Cloud leader which is Amazon.

Some have forgotten that Google was previously in a “Cloud War” of their own when they were competing with Yahoo, Microsoft, Ask, Cuil, Snap, Live Search, Excite and many others. The most recent statistics show Google holding at 67.7% of the search market, which is a considerable lead over everyone else. Google after all was born in the Cloud, lives in the Cloud and understands it better than anyone else. Many things they touch turn to gold, like Chrome, Gmail, Android and other web based applications.

https://www.netmarketshare.com/search-engine-market-share.aspx?qprid=4&qpcustomd=0

In the Private Cloud, Microsoft, VMware, Red Hat, Canonical, and Oracle are in contention with one another. Some are forming strategic alliances and partnerships for the greater good and pushing the evolution of software. Others are ignoring evolution and preferring to move forward, business as usual.

When market shares erode companies sometimes rush and make poor miscalculated decisions. One only needs to look at the fate of Blackberry to see how a company can fall rapidly from the top to the bottom of the market. Last I checked, Blackberry didn’t even own 1% of the market in the Mobile arena.

As we move into the future of Cloud, whether that is Public or Private, we will see more strategic partnerships and barriers collapse. With so many emerging technologies on the horizon and the Operating Systems becoming more of a platform for containerized applications it’s also becoming less relevant than previous.

I have heard individuals predict in the future we will write code directly to the Cloud and I agree that this will eventually happen. Everything will be ambiguous to the developer or programmer and there will be “one ring to rule them all” but the question to be answered is what ring will that be and who will be wearing it?

It’s doubtful we will ever only have a single Cloud, platform or programming language but I think we will see the birth of code and platform translators. I look at computers, technology and programming the same as spoken language. Some people learn a native language only, others learn a native tongue and branch out to other languages and may even one day become a philologist for example.

I am anxious to see how things evolve and am looking forward to seeing the development of the Cloud and internet applications. I hope I am able to witness things such as self-drivings cars, self-piloting airplanes, and real-time data analysis.

Perhaps instead of Cloud we should use the term Galaxy.

Performing under pressure – OpenStack – Good, Bad and the Ugly

Having to perform under pressure.

I am sure we have all been there in one way or another and some of us handle it better than others. It doesn’t matter if the pressure if work-related, personal or a mix of both it’s still difficult and sometimes insurmountable to perform while under a great deal of pressure.

Recently I was asked to field a copious amount of questions regarding OpenStack, how we built our infrastructure, what worked, what didn’t, how did we implement Ceph, how did we deal with security, how did we overcome trials and tribulations of a data center transformation project. Not to mention while all of this was commencing we still were required to provide support, capacity planning, security, compliance, automation and continued standing up new infrastructure fault domains and provision workloads.

Now, I am sure some people look at this and say “Meh, I can do that. No big deal” but for me, my team, and those we interface with this was not such an easy task. You have a “changing of the guard” when it comes to deploying new technology. The internet of things as people are calling it or the software-defined data center, both terms which I absolutely dislike, are not a promise of the future, they are here, now, today, all around you.

There are some that respond well to change, they accept it, adopt it and love it. Those are the ones that excel. They are the first people to run into a burning building, diffuse the situation logically and then take action. There are others who call the fire department to come and put out the blaze and wait to see what happens. Lastly, there are those that sit around filming the building burn down without care and with disregard to those inside risking their lives by choice. I think most of us forget that firefighters choose to put their lives on the line every single day. I am by no means comparing information technology to firefighting so before you flame me, I am just using it an analogy.

I have always been a person who runs first into the fire. Sometimes this is a good thing and sometimes it’s a bad thing. Eventually, one of the times you run into a burning building you are bound to get hurt, trapped and/or needing assistance from the others around you to overcome overwhelming odds.

So, back to pressure, OpenStack and having to describe the good, bad and ugly pieces of it.

We run a fairly unique implementation of OpenStack as opposed to many others and do not segment storage from computing (we use cgroups to limit, account for, and isolate resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes for Nova and Ceph. Cisco UCS converged infrastructure was implemented as opposed to white box hardware and our Ceph implementation uses a combination of SSD and HDD for performance optimization (that’s actually pretty typical).

SolidFire is used on the backend for high-performance latency sensitive applications since it allows in-line deduplication, compression and offers out of the box replication. The diagram below is a good representation of our current implementation.

Data center transformations are an incredibly sensitive topic. Much more than most people would think and especially when you are moving to a new architecture that is still in its teenage years and growing faster than a weed. Getting people to comprehend OpenStack vs VMware, XEN or traditional KVM is already a tough task in some companies and once you start telling everyone that the DevOps culture is the wave of the future and “GUI’s are for sissies” some people get pissed off.

Once again, here is where the line gets drawn in the sand. Some people LOVE GUI’s and they refuse to use command line unless it’s absolutely necessary. I can tell you from my past experience doing consulting that I have seen more network engineers use Cisco Fabric Manager than I have seen use command line. Have you ever used Fabric Manager?! It’s a nightmare!

You begin to introduce your VMware-centric people to the simple Horizon dashboard and suddenly people begin to sweat and start asking questions such as, what storage is being used, how do I know where the VM (instance! its called an instance dang it!) lives? How would I troubleshoot a problem? Is there going to be a better GUI for me to use? What about something like the vCenter client, does OpenStack have that? Does OpenStack have HA? Does OpenStack have DRS (load balancing between hypervisors)? That’s when the situation gets ugly…..

Do individuals start to say “Who is going to support this? I’m not supporting this! There is nooo wayyyyyy I can support this. Are we sure this is the right move? I mean, VMware works well right. Why would we change?.” This is when people either perform under pressure or fall apart. I mean, imagine a room packed to the gills with IT staff all with this look on their face like you just called their Ferrari ugly and now you are going to trade in for what they view as a Pinto. My apologies to those that drive a Pinto. It’s a great car, no, really, it is……

That’s when you start talking about continuous integration, Containers, Jenkins, Git, Repos, version control, security, deploying bare metal as a service, deploying Hadoop as a service, deploying Containers as a service… pretty much-deploying ANYTHING as a service up to and including disaster recovery.

Shouldn’t it be as simple as a product group logging into a portal or catalog like Service Now and checking off a list of boxes, magically get an estimated operational price and then getting an email in a few hours saying their project is ready to go. The fact is, yes! Amazon has been doing it. Google has been doing it. Microsoft has been doing it. Actually, if you think about it all major hosting providers are doing just that. They write intelligent code and automation to provision software as a service.

I almost always lean towards software as a service over any other term. If you are providing a virtual instance, it’s running containers, it uses Ceph for the back end and Gluster for the shared file system, aren’t all of those software defined? What about deploying Hadoop on top of OpenStack via Sahara. Is that also not software defined?!

Companies of all shapes and sizes want the features of Amazon without the price tag and the ability to manage resources within their own private data center. It’s about knowing where you data is, how data is being backed up and saved, clear and concise monitoring of workloads, monitoring the data center specifications and statistics, validating compliance of customer data and securing it… it’s all about CONTROL.

This is what OpenStack provides. A set of tools for companies of all sizes to deploy a series of services and a standard set of APIs that allow developers, DevOps, and administrators to provision their own elastic scalable infrastructure in the same manner that Amazon does and in some ways BETTER than Amazon.

If Amazon was the latest and greatest thing since sliced bread everyone would be on the bandwagon, Facebook, Google, Yahoo, Microsoft and all others would just say “Screw this. Let’s just deploy into Amazon, fire 70% of our staff and we are good to go!” The fact is, Amazon has its own set of difficulties. Ever tried running containers in ECS? What about wanting to deploy in a region that’s not supported? Maybe governance requires data to not leave a country’s borders and Amazon doesn’t have anything in that area? What about risk and compliance?

Factually we have been through all of this before. Remember mainframes? I do. In fact, I would say many companies still use them and are a huge part of their technology stack. When x86 came strolling along and promised the ability to replace mainframes with cheaper and smaller hardware that required less overall investment companies became hooked! Fast forward to 2015 and we are seeing a similar change, some at the physical layer but more at the logical layer.

Back to pressure. I almost forgot that was part of this post since I have been ranting.

With so much new technology being deployed at such an aggressive rate it’s really hard to be an SME at all things. I can’t say I know everything about Ceph, there are a ton of moving parts. I can’t say I am an expert when it comes to OpenStack because there are multiple distributions and multiple projects within OpenStack itself. I cannot say I know every single specific detail on how software-defined networking works and what is best to implement as it depends on the infrastructure, use case, and hardware. At some point, you have to sit back and trust members of your team to be the subject matter experts.

These people are the ones you trust the most. They take the pressure off when you need to make difficult informed decisions. They are the ones that suggest solutions and how to implement such with the lowest amount of risk and the greatest return on investment. They are the experts and they should have your full trust and the best interests of the company in mind.

I admit I am a very technical individual especially at my current level, but I am by no means an expert of all this related to software as a service. If I was, I wouldn’t need a team and I could do it all myself. Need to program some stuff is Ruby, I got it. Need me to write some stuff in Python, no worries! Need me to write some bash stuff, simple as pie. Need me to develop Puppet modules to deploy your code and have intelligent rollback capability, a piece of cake. I hope you understand I am being somewhat sarcastic but if you can excel all of those things then you are amazing! Want a job? No, seriously… do you?

A healthy job is likely to be one where the pressures on management and employees are appropriate in relation to their abilities and resources, to the amount of control they have over their work and to the support they receive. I do not believe health is the absence of disease or infirmity but a positive state of complete physical, mental and social well-being. In a healthy working environment, there is not only an absence of harmful conditions but an abundance of health-promoting ones.

Work-related stress is usually caused by a poor organization (the way jobs and work systems are designed and how we manage them). For example, lack of control over work processes, poor management, and lack of support from colleagues and supervisors can all be contributors.

I for one am a workaholic. Yes, I admit it. I am in a 12 step program to try and get back to a normal life and detach myself from The Borg collective. It’s tough though with so many emerging and new technologies coming that I am excited about, but I am trying my best for my health, both physically and mentally. The technology isn’t going to disappear overnight so it’s better to learn to pace ourselves instead of trying to run a marathon as a sprint.

Find healthy ways to relieve the pressure. Have an open door policy with your staff, team, manager and other employees. I think one thing we overlook in our current era is talking. We instead hide behind chat, email, and other electronic forms of communication and are slowly forgetting to be human. I blame Google for all of this. Joking! As we enter the age of software as a service, let’s try and be more human. After all, I am sure some/most of us have seen Blade Runner so we know how the story goes.

The day the systems administrators was eliminated from the Earth… fact or fiction?

As software becomes more complex and demands scalability of the cloud, IT’s mechanics of today, the systems administrator, will disappear. Tomorrow’s systems administrator will be entirely unlike anything we have today.

For as long as there have been computer systems, there has always been a group of individuals managing them and monitoring them named system administrators. These individuals were the glue of data centers,  responsible for provisioning and managing systems. From the monolithic platforms of the old ages to todays mixed bag approach of hardware, storage, operating systems, middleware, and software.

The typical System Administrator usually possessed super human diagnostic skills and repair capabilities to keep a complex mix of various disparate systems humming along happily. The best system administrators have always been the “Full Stack” individuals who were armed with all skills needed to keep systems up and running but these individuals were few and far between.

Data centers have become more complex over the past decade as systems have been broken down, deconstructed into functional components and segregated into groupings. Storage has been migrated to centralized blocks like a SAN and NAS thus inevitably forcing personnel to become specialized in specific tasks and skills.

Over the years, this same trend has happened with Systems Infrastructure Engineers/Administrators, Network Engineers/Administrators and Application Engineers/Administrators.

Everywhere you look intelligence is being built directly into products.I was browsing the aisle at Lowe’s this past weekend and noted that clothes washers, dryers, and refrigerators are now being shipped equipped with WIFI and NFC to assist with troubleshooting problems, collecting error logs and opening problem service tickets. No longer do we need to pour over those thousand pages long manuals looking for error code EC2F to tell us that the water filter has failed, the software can do it for us! Thus is has become immediately apparent that if tech such as this has made its way into low-level basic consumer items things must be changing even more rapidly at the top.

I obviously work in the tech industry and would like to think of myself as a technologist and someone who is very intrigued by emerging technologies. Electric cars, drones, remotely operated vehicles, smartphones, laptops that can last 12+ hours daily while fitting in your jeans pocket and the amazing ability to order items from around the globe and have them shipped to your door. These things astound me.

The modern car was invented in 1886 and in 1903, we invented the airplane. The first commercial air flight was not until 1914 but to see how far we have come in such a short time is astounding. It almost makes you think we were asleep for the last Century prior.

As technology has evolved there has been a need for software to also evolve at a similarly rapid pace. In many ways, we have outpaced software with hardware engineering over the last Score and now software is slowly catching up and surpassing hardware engineering.

Calm down, I know I am rambling again. I will digress and get to the point.

The fact is, the Systems Administrator as we know it is a dying breed. Like the dinosaur, the caveman and the wooly mammoth. All of these were great at some things but never enough to stay alive and thus were wiped out.

So what happens next? Do we all lose our jobs? Does the stock fall into a free fall and we all start drinking Brawndo the Thirst Mutilator (if you havent seen Idiocracy I feel for you.) The fact is, it’s going to be a long, slow and painful death.

Companies are going to embrace cloud at a rapid rate and as this happens people will either adapt or cling to their current ways. Not every company is going to be “cloudy”.

Stop. Let me state something. I absolutely HATE the word Cloud. It sounds so stupid. Cloud. Cloud. Cloud. Just say it. How about we all instead embrace the term share nothing scalable distributed computing. That sounds better.

So, is this the end of the world? No, but it does mean “The Times They Are a Changin” to quote Mr. Dylan.

A fact is, change is inevitable. If things didn’t change we would still be living in huts, hunting with our bare hands and using horses as our primary methods of transportation. We wouldn’t have indoor toilets, governments, rules, regulations, or protection from others as there would be no law system.

Sometimes change is good and sometimes its bad. In this case, I see many good things coming down the road but I think we all need to see the signs posted along the highway.

Burying ones head in the dirt like an Ostrich is not going to protect you.

How to build a large scale multi-tenant cloud solution

It’s not terribly difficult to design and build a turnkey integrated pre-configured SDDC ready to use solution. However building one that completely abstracts the compute, storage and network physical resources and provides multiple tenants a pool of logical resources along with all the necessary management, operational and application level services and allows to scale resources with seamless addition of new rack units.

The architecture should be a vendor agnostic solution with limited software tie-in to hardware vendor specifics but expandable to support various vendor hardware needs with plug-n-play architecture.

Decisions should be made early if the solution will come in various forms and factors from appliances, quarter, half and full racks providing different levels of capacity, performance, redundancy HA, SLA’s. Building a ground-up architecture to expand to mega rack scale architecture in future with distributed infrastructure resources without impacting the customer experience and usage.

The design should contain more than one physical rack with each rack unit composing of: Compute Servers with direct attached storage (software defined) a Top of the Rack and Management Switches hardware Data Plane, Control Plane and Management Plane software Management plane software Platform level Operations, Management and Monitoring software Application-centric workload Services.

Most companies have a solution based on a number of existing technologies, architectures, products, and processes that have been part of the legacy application hosting and IT operations environments. These environment can usually be repurposed for some of the scalable cloud components which saves time, cost and the result is a stable environment that operations can still manage/operate with existing processes and solutions.

In order to evolve the platform to provide not only for stability and supportability but additional features such as elasticity and improved time to market companies should begin immediately initiating a project to investigate and redesign the underlying platform.

In scope for this effort are assessments of the network physical and logical architecture, the server physical and logical architecture, the storage physical and logical architecture, the server virtualization technology, and the platform-as-a-service technology.

The approach to this effort will include building a mini proof of concept based on a hypothesized preferred architecture and benchmarking it against alternative designs. This proof of concept then should be able to scale to a production sized system.

Implement a scalable elastic IaaS – PaaS leveraging self-service automation and orchestration that enables end users the ability to self-service provision applications within the cloud itself.

Suggested phases of the project would be as follows:

Phase Description:

  • Phase I Implementation of POC platforms
  • Phase II Implementation of logical resources
  • Phase III Validation of physical and logical resources
  • Phase II Implementation of platform as a service components
  • Phase IV Validation of platform as a service components
  • Phase V Platform as a service testing begins
  • Phase VI Review, document complete knowledge transfer
  • Phase VII Present fact findings to executive management

Typically there are four fundamental components to cloud design; infrastructure, platform, applications, and business process.

The infrastructure and platform as a service components are typically the ideal starting place to drive new revenue opportunities, whether by reselling or enabling greater agility within the business.

With industries embracing cloud design at a record pace and technology corporations focusing on automation this allows the benefit of moving towards a cloud data infrastructure design.

Cloud Data infrastructure allows the ability to provide services, servers, storage, and networking on-demand at any time with minimal limits helping to create new opportunities and drive new revenue.

The “Elastic” pay-as-you-go data center infrastructure should provide a managed services platform allowing application owner groups the ability to operate individually while sharing a stable common platform.

Having a common platform and infrastructure model will allow applications to mature while minimizing code changes and revisions due to hardware, drivers, software dependencies and infrastructure lifecycle changes.

This will provide a stable scalable solution that can be deployed at any location regardless of geography.

Today’s data centers are migrating away from the client-server distributed model of the past towards the more virtualized model of the future.

Storage: As business applications grow in complexity, the need for larger more reliable storage becomes a data center imperative. Disaster Recovery / Business Continuity: Data centers must maintain business processes for the overall business to remain competitive. Dense server racks make it very difficult to keep data centers cool and keep costs down. Cabling: Many of today’s data centers have evolved into a complex mass of interconnected cables that further increase rack density and further reduce data center ventilation.

These virtualization strategies introduce their own unique set of problems, such as security vulnerabilities, limited management capabilities, and many of the same proprietary limitations encountered with the previous generation of data center components.

When taken together, these limitations serve as barriers against the promise of application agility that the virtualized data center was intended to provide.

The fundamental building block of an elastic infrastructure is the workload. Workloads should be thought of as the amount of work that a single server or ‘application gear/container/instance’ can provide given the amount of resources allocated to it

Those resources encompass compute (CPU & RAM), data (disk latency & throughput), and networking (latency & throughput). A workload is an application, part of an application, or a group of application that’s work together. There are two general types of workload that the most customers need to address: those running within a Platform-as-a-Service construct and those running on a hypervisor construct. Sometimes bare metal should also be considered where applicable but this is in rare circumstances.

Much like database sharding, the design should be limited by fundamental sizing limitations which will allow a subset of resources to be configured at maximum size hosting multiple copies of virtual machines, applications group and distributed load balanced across a cluster of hypervisors that share a common persistent storage back end.

This is similar to load balancing but not exactly the same as a customer or specific application will only be placed in particular ‘Cradles’. A distribution system will be developed to determine where tenants will be placed upon login to and direct them to the Cradle they were assigned.

In order to aggregate as many workloads as possible in each availability zone or region, a specific reference architecture design should be made to determine the ratio virtual servers per physical server.

The size will be driven by a variety of factors including oversubscription models, technology partners, and network limitations.The initial offering will result in a prototype and help determine scalability & capacity and this design should scale in a linear predictable fashion.

The cloud control system and its associated implementations will be comprised of Regions or Availability Zones. Similar in many ways to what Amazon AWS does currently.

The availability zone model allows the ability to isolates one fault domain from another. Each availability zone has isolation and redundancy in management, hardware, network, power, and facilities. If power is lost in a given availability zone tenants in another availability zone are not impacted. Each availability zone resides in a single datacenter facility and is relatively independent. Availability zones are then aggregated into a regions and regions into the global resource pool.

The basic components would be as follows:

· Hypervisor and container management control plane
· Cloud orchestration
· Cloud blueprints/templates
· Automation
· Operating system and application provisioning
· Continuous application delivery
· Utilization monitoring, capacity planning, and reporting

hardware considerations should be as follows:

· Compute scalability
· Compute performance
· Storage scalability
· Storage performance
· Network scalability
· Network performance
· Network architecture limitations
· Oversubscription rates & capacity planning
· Solid-state flash leveraged to increase performance and decrease deployment times

Business concerns would be:

· Cost-basis requirements
· Margins
· Calculating cost VS profits to show ROI (chargeback/show back)
· Licensing costs

The extensibility of the solution dictates the ability to use third party tools for authentication, monitoring, and legacy applications. The best cloud control system should allow the ability to integrate legacy systems and software with relative ease. Its my own personal preference to lead with Open Source software but that decision is left to the user to decide.

Monitoring,  capacity planning, and resource optimization should consider the following:

· Reactive – Break-Fix monitoring where systems and nodes are monitored for availability and service is manually restored
· Proactive – Collect metrics data to maintain availability, performance, and meet SLA requirements
Forecasting – Use proactive metric data to perform capacity planning and optimize capital usage

Because cloud computing is a fundamental paradigm shift in how Information Technology services are usually delivered it will cause significant disruption inside most of the current organizations. Helping each of these organizations embrace the change will be key.

While final impacts are currently impossible to measure it’s clear that a self-service model is clearly the future and integral to delivering customer satisfaction, both from an internal or external user perspective.

Some proof of concept initiatives would be as follows:

· Determine a go-forward architecture for the IaaS and PaaS offering inclusive of a software defined network
· Benchmark competing architecture options against one another from a price, performance, and manageability perspective
· Establish a “mini-cradle” that can be maintained and used for future infrastructure design initiatives and tests
· Determine how application deployment can be fully or partially automated
· Determine a cloud control system to facilitate provisioning of Operating Systems and multi-tiered applications
· Complete the delivery of FAC to generate metrics and provide statistics
· Show the value of self-service to internal organizations
· Measure the ROI based on cost of the cloud service delivery combined with the business value
· Don’t build complex for the initial offering
· Avoid spending large amounts of capital expenses on the initial design

After implementing a proof of concept testing encompassing the following(and more) should be done:

Proof of Functionality

  • The solution system runs in our datacenter; on our hardware
  • The solution system can be implemented with multi-network configuration
  • The solution system can be implemented with as few manual steps as possible (automated installation)
  • The solution systems have the ability to drive implementation via API
  • The solution system provides a single point of management for all components
  • The solution system enables dynamic application mobility by decoupling the definition of an application from the underlying hardware and software
  • The solution system can support FAC production operating systems
  • The solution system Hypervisor and guest OS are installed and fully functional
  • The solution systems support internal and external authentication against existing authentication infrastructure.
  • The solution system functions as designed and tested

Proof of Resiliency

  • The solution system components are designed for high availability
  • The solution system provides multi-zone (inter-DC,inter-region, etc.) management
  • The solution system provides multi Data Center management

Integration Testing

  • The solution system is compatible with legacy, current, and future systems integration

Complexity Testing

  • The solution system has the ability to manage both simple and complex configurations

Metric Creation

  • The solution systems have metrics that can be monitored

How to configure MDS Port-Channels for Cisco UCS

First, enable the fibre-port-channel feature:

conf t
feature fport-channel-trunk

Next we need to configure the SAN port channel first before adding ports to it

interface san-port-channel 100
channel mode active
switchport trunk mode off

We set the channel mode to auto, SAN port channel only supports on or active, active negotiates an FC port channel but on forces it to be on. We then set trunking mode to be off, this might be different for you if your using both npiv and trunking multiple VSANs.

Next, configure the actual ports to be members of the port channel:

interface fc1/29
switchport trunk mode off
channel-group 100 force

interface fc1/30
switchport trunk mode off
channel-group 100 force

Once you have done this if your VSAN was not vsan 1 you would need to bind this interface:

vsan database
vsan XYZ interface san-port-channel-100

You can now no shut this:

interface fc1/29
no shut
interface fc1/30
no shut

interface san-port-channel-100
no shut

Once this is done you need to configure the UCS to support this connectivity.

Why and what is Ceph?

This series of posts is not only focused on Ceph itself, but most of all what you can do with it.

Why Ceph?

Because it’s free and open source, it can be used in every lab, even at home. You only need 3 servers to start; they can be 3 spare servers you have around, 3 computers, or also 3 virtual machines all running in your laptop. Ceph is a great platform to improve knowledge about Object Storage and Scale-Out systems.

Ceph is one of the few large-scale storage solutions based on open source software, so it’s easy to study it even in your home lab.

Ceph is an open source software solution. It requires administrative linux skills, and if you need commercial support there are a few options to select from. If you don’t feel at ease with a “build your own” solution, look around to but a commercial solution.

What is Ceph storage?

A quick introduction about Ceph. Ceph is an open source distributed storage system, built on top of commodity components, demanding reliability to the software layer. A simple description would be scale out software defined object storage built on commodity hardware.

Ceph was originally designed by Sage Weil during his PhD, and afterwards managed and distributed by InkTank, a company specifically created to offer commercial services for Ceph, and where Sage had the CTO role. Last April 2014, Inktank has been acquired by RedHat.

Ceph scales out. It is designed to have no single point of failure, it can scale to an infinite number of nodes, and nodes are not coupled with each other, its a shared-nothing architecture, while traditional storage systems have instead some components shared between controllers (cache, disks…).

Ceph is built to organize data automatically using Crush, the algorithm responsible for the intelligent distribution of objects inside the cluster, and then uses the nodes of the cluster as the managers of those data. Crush also has the ability to manage fault domains, even entire data centers and thus allows the ability to create a geo-cluster that it can protect itself even from huge disasters.

The other components are the nodes. Ceph is built using simple servers, each with some amount of local storage, replicating to each other via network connections. There is no shared component between servers, even if some roles like Mons (monitors) are created only on some servers and accessed by all the nodes.

Ceph does not use technologies like RAID or Parity, redundancy is guaranteed using replication of the objects, that is any object in the cluster is replicated at least twice in two different places of the cluster. You can actually adjust this setting to maintain less or more replicas of your choosing. If a node fails, the cluster identifies the blocks that are left with only one copy, and creates a second copy somewhere else in the cluster. The latest versions of Ceph can also use erasure code, saving even more space at the expense of performances.

Ceph can be dynamically expanded or shrink by adding or removing nodes to the cluster and letting the Crush algorithm rebalance objects. Ceph also provides object storage. Data is not files in a file system hierarchy, nor blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Each file entering the cluster is saved in one or more objects depending on its size, some metadata referring to the objects are created, a unique identifier is assigned and the object is saved multiple times in the cluster. The process is reversed when data needs to be accessed.

The advantage over file or block storage is mainly in size: the architecture of an object storage can easily scale to massive sizes; in fact, it’s used in those solutions that needs to deal with incredible amounts of objects. To name a few, Dropbox or Facebook are built on top of object storage systems since it’s the best way to manage millions of files.