It’s not terribly difficult to design and build a turnkey integrated pre-configured SDDC ready to use solution. However building one that completely abstracts the compute, storage and network physical resources and provides multiple tenants a pool of logical resources along with all the necessary management, operational and application level services and allows to scale resources with seamless addition of new rack units.
The architecture should be a vendor agnostic solution with limited software tie-in to hardware vendor specifics but expandable to support various vendor hardware needs with plug-n-play architecture.
Decisions should be made early if the solution will come in various forms and factors from appliances, quarter, half and full racks providing different levels of capacity, performance, redundancy HA, SLA’s. Building a ground-up architecture to expand to mega rack scale architecture in future with distributed infrastructure resources without impacting the customer experience and usage.
The design should contain more than one physical rack with each rack unit composing of: Compute Servers with direct attached storage (software defined) a Top of the Rack and Management Switches hardware Data Plane, Control Plane and Management Plane software Management plane software Platform level Operations, Management and Monitoring software Application-centric workload Services.
Most companies have a solution based on a number of existing technologies, architectures, products, and processes that have been part of the legacy application hosting and IT operations environments. These environment can usually be repurposed for some of the scalable cloud components which saves time, cost and the result is a stable environment that operations can still manage/operate with existing processes and solutions.
In order to evolve the platform to provide not only for stability and supportability but additional features such as elasticity and improved time to market companies should begin immediately initiating a project to investigate and redesign the underlying platform.
In scope for this effort are assessments of the network physical and logical architecture, the server physical and logical architecture, the storage physical and logical architecture, the server virtualization technology, and the platform-as-a-service technology.
The approach to this effort will include building a mini proof of concept based on a hypothesized preferred architecture and benchmarking it against alternative designs. This proof of concept then should be able to scale to a production sized system.
Implement a scalable elastic IaaS – PaaS leveraging self-service automation and orchestration that enables end users the ability to self-service provision applications within the cloud itself.
Suggested phases of the project would be as follows:
- Phase I Implementation of POC platforms
- Phase II Implementation of logical resources
- Phase III Validation of physical and logical resources
- Phase II Implementation of platform as a service components
- Phase IV Validation of platform as a service components
- Phase V Platform as a service testing begins
- Phase VI Review, document complete knowledge transfer
- Phase VII Present fact findings to executive management
Typically there are four fundamental components to cloud design; infrastructure, platform, applications, and business process.
The infrastructure and platform as a service components are typically the ideal starting place to drive new revenue opportunities, whether by reselling or enabling greater agility within the business.
With industries embracing cloud design at a record pace and technology corporations focusing on automation this allows the benefit of moving towards a cloud data infrastructure design.
Cloud Data infrastructure allows the ability to provide services, servers, storage, and networking on-demand at any time with minimal limits helping to create new opportunities and drive new revenue.
The “Elastic” pay-as-you-go data center infrastructure should provide a managed services platform allowing application owner groups the ability to operate individually while sharing a stable common platform.
Having a common platform and infrastructure model will allow applications to mature while minimizing code changes and revisions due to hardware, drivers, software dependencies and infrastructure lifecycle changes.
This will provide a stable scalable solution that can be deployed at any location regardless of geography.
Today’s data centers are migrating away from the client-server distributed model of the past towards the more virtualized model of the future.
Storage: As business applications grow in complexity, the need for larger more reliable storage becomes a data center imperative. Disaster Recovery / Business Continuity: Data centers must maintain business processes for the overall business to remain competitive. Dense server racks make it very difficult to keep data centers cool and keep costs down. Cabling: Many of today’s data centers have evolved into a complex mass of interconnected cables that further increase rack density and further reduce data center ventilation.
These virtualization strategies introduce their own unique set of problems, such as security vulnerabilities, limited management capabilities, and many of the same proprietary limitations encountered with the previous generation of data center components.
When taken together, these limitations serve as barriers against the promise of application agility that the virtualized data center was intended to provide.
The fundamental building block of an elastic infrastructure is the workload. Workloads should be thought of as the amount of work that a single server or ‘application gear/container/instance’ can provide given the amount of resources allocated to it
Those resources encompass compute (CPU & RAM), data (disk latency & throughput), and networking (latency & throughput). A workload is an application, part of an application, or a group of application that’s work together. There are two general types of workload that the most customers need to address: those running within a Platform-as-a-Service construct and those running on a hypervisor construct. Sometimes bare metal should also be considered where applicable but this is in rare circumstances.
Much like database sharding, the design should be limited by fundamental sizing limitations which will allow a subset of resources to be configured at maximum size hosting multiple copies of virtual machines, applications group and distributed load balanced across a cluster of hypervisors that share a common persistent storage back end.
This is similar to load balancing but not exactly the same as a customer or specific application will only be placed in particular ‘Cradles’. A distribution system will be developed to determine where tenants will be placed upon login to and direct them to the Cradle they were assigned.
In order to aggregate as many workloads as possible in each availability zone or region, a specific reference architecture design should be made to determine the ratio virtual servers per physical server.
The size will be driven by a variety of factors including oversubscription models, technology partners, and network limitations.The initial offering will result in a prototype and help determine scalability & capacity and this design should scale in a linear predictable fashion.
The cloud control system and its associated implementations will be comprised of Regions or Availability Zones. Similar in many ways to what Amazon AWS does currently.
The availability zone model allows the ability to isolates one fault domain from another. Each availability zone has isolation and redundancy in management, hardware, network, power, and facilities. If power is lost in a given availability zone tenants in another availability zone are not impacted. Each availability zone resides in a single datacenter facility and is relatively independent. Availability zones are then aggregated into a regions and regions into the global resource pool.
The basic components would be as follows:
· Hypervisor and container management control plane
· Cloud orchestration
· Cloud blueprints/templates
· Operating system and application provisioning
· Continuous application delivery
· Utilization monitoring, capacity planning, and reporting
hardware considerations should be as follows:
· Compute scalability
· Compute performance
· Storage scalability
· Storage performance
· Network scalability
· Network performance
· Network architecture limitations
· Oversubscription rates & capacity planning
· Solid-state flash leveraged to increase performance and decrease deployment times
Business concerns would be:
· Cost-basis requirements
· Calculating cost VS profits to show ROI (chargeback/show back)
· Licensing costs
The extensibility of the solution dictates the ability to use third party tools for authentication, monitoring, and legacy applications. The best cloud control system should allow the ability to integrate legacy systems and software with relative ease. Its my own personal preference to lead with Open Source software but that decision is left to the user to decide.
Monitoring, capacity planning, and resource optimization should consider the following:
· Reactive – Break-Fix monitoring where systems and nodes are monitored for availability and service is manually restored
· Proactive – Collect metrics data to maintain availability, performance, and meet SLA requirements
Forecasting – Use proactive metric data to perform capacity planning and optimize capital usage
Because cloud computing is a fundamental paradigm shift in how Information Technology services are usually delivered it will cause significant disruption inside most of the current organizations. Helping each of these organizations embrace the change will be key.
While final impacts are currently impossible to measure it’s clear that a self-service model is clearly the future and integral to delivering customer satisfaction, both from an internal or external user perspective.
Some proof of concept initiatives would be as follows:
· Determine a go-forward architecture for the IaaS and PaaS offering inclusive of a software defined network
· Benchmark competing architecture options against one another from a price, performance, and manageability perspective
· Establish a “mini-cradle” that can be maintained and used for future infrastructure design initiatives and tests
· Determine how application deployment can be fully or partially automated
· Determine a cloud control system to facilitate provisioning of Operating Systems and multi-tiered applications
· Complete the delivery of FAC to generate metrics and provide statistics
· Show the value of self-service to internal organizations
· Measure the ROI based on cost of the cloud service delivery combined with the business value
· Don’t build complex for the initial offering
· Avoid spending large amounts of capital expenses on the initial design
After implementing a proof of concept testing encompassing the following(and more) should be done:
Proof of Functionality
- The solution system runs in our datacenter; on our hardware
- The solution system can be implemented with multi-network configuration
- The solution system can be implemented with as few manual steps as possible (automated installation)
- The solution systems have the ability to drive implementation via API
- The solution system provides a single point of management for all components
- The solution system enables dynamic application mobility by decoupling the definition of an application from the underlying hardware and software
- The solution system can support FAC production operating systems
- The solution system Hypervisor and guest OS are installed and fully functional
- The solution systems support internal and external authentication against existing authentication infrastructure.
- The solution system functions as designed and tested
Proof of Resiliency
- The solution system components are designed for high availability
- The solution system provides multi-zone (inter-DC,inter-region, etc.) management
- The solution system provides multi Data Center management
- The solution system is compatible with legacy, current, and future systems integration
- The solution system has the ability to manage both simple and complex configurations
- The solution systems have metrics that can be monitored