Istio Blog

Istio Project Announces 2026 Steering Committee

Fri, 27 Mar 2026 00:00:00 +0000

The Istio Steering Committee oversees the administrative aspects of the project, including governance, branding, marketing, and working with the CNCF.

Every year, we estimate the proportion of the hundreds of companies that have contributed to Istio in the past year, and use that metric to proportionally allocate the nine Contribution Seats on our Steering Committee.

After that, four Community Seats are voted for by our project members, with candidates being from companies that did not receive Contribution Seats.

In February, we announced the Contribution Seat allocation, and invited candidates to stand for the Community Seat elections. As the election officer, I am pleased to announce the results of that election, as well as the individuals who will represent the top contributors.

Community Seats

Five candidates stood for our four open seats. Using Condorcet-method ranked voting, the following four candidates have been elected:

Faseela K, Ericsson Software Technology
Craig Box, Independent
Tyler Schade, GEICO Tech
Zhonghu Xu, Huawei

We’re excited to welcome this fantastic group of community leaders. Faseela is returning for her fourth term on Steering and also serves on the CNCF Technical Oversight Committee. Craig has been on the Steering Committee since 2020 and recently ended his term as the maintainer representative on the CNCF Governing Board. Tyler is a new face on Steering, bringing an end-user perspective from GEICO Tech, where he leads a team building a custom service mesh on Istio. Zhonghu has been an Istio contributor since the project’s early days, is a core maintainer in multiple working groups, and is transitioning this year from a Contribution Seat to a Community Seat.

Contribution Seats

Our supporting companies have made their choices for the nine Contribution Seats. They will be held by:

Zack Butcher (Tetrate)
Rob Cernich (Red Hat)
John Howard (Solo.io)
Steven Jin Xuan (Microsoft)
Jack Ma (Microsoft)
Keith Mattix (Microsoft)
Louis Ryan (Solo.io)
Lin Sun (Solo.io)
Ram Vennam (Solo.io)

Seating the new committee

On behalf of the Steering Committee, I wish to congratulate our new and returning members. This group will serve for one year, starting this week.

We would also like to extend our heartfelt thanks to Idit Levine, Rob Scott, Pratima Nambiar, and Wilson Wu, whose terms have now ended.

The new team will continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community, and help us shape the future of the world’s most popular service mesh.

Istio is Migrating Container Registries

Mon, 23 Mar 2026 00:00:00 +0000

Due to changes in Istio’s funding model, Istio images will no longer be available at gcr.io/istio-release starting January 1st, 2027. That is, clusters that reference images hosted on gcr.io/istio-release might fail to create new pods in 2027.

In fact, we are fully migrating all Istio artifacts out of Google Cloud, including Helm charts. Future communications will cover the migration of Helm charts and other artifacts. This post will focus on what you can do today in response to the 2027 container registry migration.

Am I affected?

By default, Istio installations use Docker Hub (docker.io/istio) as their container registry, but many users choose to use the gcr.io/istio-release mirror. You can check whether you are using the mirror using the following command.

$ kubectl get pods --all-namespaces -o json \
    | jq -r '.items[] | select(.spec.containers[].image | startswith("gcr.io/istio-release")) | "\(.metadata.namespace)/\(.metadata.name)"'

The above command will list all the pods that use images hosted on gcr.io/istio-release. If there are any such pods, you will likely need to migrate.

What to do today

Although we plan to keep images available on gcr.io/istio-release until late 2026, we have set up registry.istio.io as the new home for Istio images. Please migrate to using registry.istio.io as soon as possible.

Using `istioctl`

If you install Istio using istioctl, you can update your IstioOperator configuration as follows:

# istiooperator.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  # ...
  hub: registry.istio.io/release
  # Everything else can stay the same unless you reference `gcr.io/istio-release` images elsewhere

and install Istio using this configuration

$ istioctl install -f istiooperator.yaml

Alternatively, you can pass in the registry as a command line argument

$ istioctl install --set hub=registry.istio.io/release # the rest of your arguments

Using Helm

If you use Helm to install Istio, update your values file to have the following:

# ...
hub: registry.istio.io/release
global:
  hub: registry.istio.io/release
# Everything else can stay the same unless you reference `gcr.io/istio-release` images elsewhere

Then, update your Helm installation with your new values file.

Private mirrors

Your organization might pull images from gcr.io/istio-release, push them to a private registry, and reference the private registry in your Istio installation. This process will still work, but you will have to pull from registry.istio.io/release instead of gcr.io/istio-release.

Security Considerations on Istio's CRDs with Namespace-based Multi-Tenancy

Sat, 21 Mar 2026 00:00:00 +0000

The Istio project wants to address a possible Man-in-the-Middle (MITM) attack scenario in which a VirtualService can redirect or intercept traffic within the service mesh. This affects namespace-based multi-tenancy clusters where tenants have the permissions to deploy Istio resources (networking.istio.io/v1).

This blog post highlights the risks of using Istio in multi-tenant clusters and explains how users can mitigate these risks and safely operate Istio in their deployments.

Please note that the issues even extend beyond the cluster scope in a “single mesh with multiple clusters” deployment.

The behavior described in this post applies to Istio version 1.29.0 and to all versions since the introduction of the mesh gateway option in the VirtualService resource.

Background

Namespace-based Multi-Tenancy

Namespaces in Kubernetes provide a mechanism for organizing groups of resources within a cluster. Namespaces provide a logical abstraction that allows teams, applications, or environments to share a single cluster while isolating their resources via controls such as Network Policies, RBAC, and so on.

In this blog post, we focus on running Istio in clusters where multiple tenants share the same cluster and service mesh, and can deploy Istio resources (networking.istio.io/v1) in their namespaces while relying on namespace boundaries for isolation.

Traffic Routing in Istio

Istio provides traffic management capabilities by separating application logic from network routing behavior. It introduces additional configuration resources through CRDs that allow operators to define how traffic should be routed between services in the mesh.

One of the central resources for this purpose is the VirtualService. A VirtualService defines a set of routing rules that determine how requests to hosts specified in spec.hosts.[] are handled. These rules can match requests based on properties such as HTTP headers, paths, or ports, and can then direct the traffic to one or more destination services.

Routing decisions defined in a VirtualService are not limited to a single workload or namespace. Depending on how the resource is configured, these rules can affect traffic routing across the entire mesh.

In contrast to the newer Kubernetes Gateway API, these CRDs were created and effectively stabilized before namespace-based RBAC even made its way to Kubernetes. Thus, namespace-based multi-tenancy that shares the same service mesh was not part of the threat model at the time. With the introduction of RBAC, such multi-tenant environments emerged. It is therefore important to highlight and address the security risks associated with those architectures.

In the following section, we demonstrate those risks and show that this mechanism can be abused to intercept traffic in a namespace-based multi-tenant cluster. Later, we introduce ways to mitigate those risks.

Man-in-the-Middle Attacks through VirtualService

In a namespace-based multi-tenant environment, it is often assumed that namespaces provide sufficient trust boundaries between resources across different namespaces. However, Istio’s traffic routing configuration operates at the mesh level, meaning that routing rules defined in one namespace will influence traffic originating from workloads in other namespaces.

An attacker who has permission to create or modify VirtualService resources can abuse this behavior by defining routing rules for arbitrary hosts. When the service mesh parameter mesh is set in the gateways section of the spec, the routing rules are applied to all sidecar proxies in the mesh (independent of their namespace).

This allows an attacker to create a malicious VirtualService that matches requests for specific hostnames and redirects them to an attacker-controlled service. As a result, traffic from other workloads in the mesh can be transparently routed through the attacker’s service before reaching its intended destination.

This behavior enables MITM attacks within the service mesh. The attacker-controlled service can intercept traffic from services in the mesh. This includes traffic to other services in the mesh as well as traffic to the external services. This allows the attacker to:

act as the destination service.
redirect traffic to alternative destinations.
drop requests to disrupt the communication (denial-of-service).

The source service will send the request to the attacker-controlled service instead of the destination service as the VirtualService overrides the default behavior. Istio’s mutual TLS authentication does not help here, because the proxy identifies the attacker-controlled service as the legitimate destination of the overwritten hostname. However, forwarding this traffic to the destination service to read or modify communication between the two services is more challenging for the attacker, as they cannot bypass Istio’s Layer 4 and Layer 7 security features. As the attacker intercepts the communication, the end-to-end encryption and authentication between the source and the destination service are broken. Thus, the request forwarded from the attacker-controlled service to the destination service is authenticated as a request from the attacker-controlled service. As a result, Authorization Policies configured on the destination service may deny the request. In addition, destination service will see the attacker-controlled service identity in the X-Forwarded-Client-Cert header, and the authentication from the source service is lost.

Why does this behavior occur?

This behavior results from how Istio distributes and evaluates traffic routing configuration within the service mesh.

Istio service mesh is logically split into a data plane and a control plane. Istio’s control plane aggregates routing configuration from all VirtualService resources and distributes the resulting configuration to the Envoy sidecar proxies that make up the data plane. These proxies then enforce routing rules locally for the traffic they handle, see also Istio Architecture.

When a VirtualService is configured as a mesh gateway, its routing rules apply to all sidecars in the mesh, including internal service-to-service traffic. Since the effects of this configuration are not limited to the namespace in which the VirtualService resides, a configuration created in one namespace can match requests originating from workloads in other namespaces.

Mitigation and Best Practices

Operators running Istio in namespace-based multi-tenancy setups or operating a single mesh across multiple clusters should apply additional safeguards to maintain strong isolation. Without these controls, unintended cross-namespace traffic manipulation can occur at the data plane level.

Recommended Mitigation: Migrate to the Newer Gateway API

Ideally, permissions to create or modify Istio networking resources (networking.istio.io/v1 as well as security.istio.io/v1) should be limited to platform operators responsible for global routing.

As an alternative, operators can offer tenants access to the newer Gateway API, which was designed with safe cross-namespace support in mind. However, the platform operators still need to control access to shared resources such as gateways.

Configuration Scoping can be implemented as an additional control.

Mitigation in Legacy Setups

When such changes and restrictions aren’t feasible due to business or organizational requirements, routing configurations should be scoped to specific services or namespaces. Broad rules that affect the entire mesh should be avoided unless explicitly intended, and their implications are well understood.

One way to mitigate this kind of attack is to configure Scoping. For instance, to restrict the Egress listener in every namespace to trusted namespaces. However, this would only mitigate the issue in sidecar mode and ambient mode with waypoints, but not in L4-only ambient mode, and also not for hosts configured when an Istio Gateway is used.

Another way to mitigate this kind of attack is to implement an admission policy that limits which hosts can be used in the host section for each tenant. This will also mitigate the issue in ambient mode.

Conclusion

As shown in this post, Istio’s mesh gateway option allows rules defined in one namespace to affect the traffic of other namespaces. In namespace-based multi-tenancy setups or when running a single mesh across multiple clusters, this behavior may expose the service mesh to malicious actors, e.g., enabling MITM attacks, as explained in this blog post.

Istio does not claim (nor seek to claim) hard namespace-based multi-tenancy as the project chose the tradeoff that eases adoption. Thus, operators who rely on this kind of multi-tenancy should assess the risks involved in their architecture and address the weaknesses, e.g., by removing unnecessary RBAC permissions and enforcing strict admission controls.

References

Istio at KubeCon Europe 2026: Let’s Connect in Amsterdam!

Mon, 23 Feb 2026 00:00:00 +0000

Get ready for a packed agenda of Istio activities at KubeCon + CloudNativeCon Europe 2026, including Cloud Native Theater Istio sessions featuring amazing speakers, hands-on experiences, and chances to meet maintainers and fellow community members in person.

Istio highlights at KubeCon EU 2026

Join us at Cloud Native Theater on Tuesday, March 24, 2026 starting at 14:30 CET for 4 amazing Istio Day sessions:
- Tales From the Mesh: Horrors and Successes of Running Istio in Production — A panel where real users of Istio share real world experiences, challenges, and wins with Istio service mesh.
- Zero-Downtime Migration from ingress-nginx to Istio in a Multi-Cluster Kubernetes Platform at Bloomberg — Learn how Bloomberg migrated to Istio across multiple clusters without downtime.
- The Good, The Ugly, and The Bad: Leaving Sidecars Behind with Istio Ambient Mesh — An honest look at migrating to ambient mesh architecture.
- Running State of the Art Inference with Istio and LLM-D — Discover how Istio enables efficient AI inference serving.
Join the Maintainer Track: Evolution or Revolution: Istio as the Network Platform for Cloud Native — two maintainers explore how Istio’s vision of a universal dataplane has guided the project’s evolution, from powering global multicluster connectivity to enabling AI inference. Learn how you can contribute as an “Istio power user contributor” and share feedback on future improvements.
Participate in the Cloud Native Novice Track: From First Contribution to Leadership: Lessons on Becoming a CNCF Leader — a perfect session for newcomers interested in growing within the ecosystem. Learn about overcoming early challenges, maintaining momentum in large communities, and how authenticity, curiosity, and community support can accelerate your journey to leadership.

Recommended Sessions at KubeCon

Here are recommended sessions from the main conference with strong Istio relevance:

Meet Us in Person

Stop by the Istio kiosk in the Project Pavilion throughout the event to chat with maintainers, contributors, and users!

Don’t Miss Out

KubeCon + CloudNativeCon is always the perfect time to connect, learn, and celebrate the amazing work happening across the Istio community. Stay tuned for more updates as the event approaches.

Ambient multi-network multicluster support is now Beta

Wed, 18 Feb 2026 00:00:00 +0000

Our team of contributors has been busy throughout the transition to 2026. A lot of work was done to get the multi-network multicluster for ambient to production ready state. Improvements were made in areas from our internal tests, up to the most popular multi-network multicluster asks in ambient, with a big focus on telemetry.

Gaps in Telemetry

The benefits of a multicluster distributed system are not without their tradeoffs. Some complexity is inevitable with larger scale, making good telemetry even more important. The Istio team understands that point and we were aware of some gaps that needed to be covered. Thankfully, on release 1.29, telemetry is now more robust and complete when our ambient data plane operates over distributed clusters and networks.

If you’ve deployed alpha multicluster capabilities before in multi-network scenarios, you might have noticed some source or destination labels would show as “unknown”.

For context, in a local cluster (or clusters sharing the same network), waypoint and ztunnel are aware of all existing endpoints, and they acquire that information through xDS. Confusing metrics instead often occur in multi-network deployments where, given all the information that needs to be replicated across separate networks, the xDS peer discovery is unpractical. Unfortunately, that results in missing peer information when requests traverse network boundaries to reach a different Istio cluster.

Telemetry Enhancements

Overcoming that problem, Istio 1.29 now ships with augmented discovery mechanisms in its data plane for exchanging peer metadata between endpoints and gateways sitting across different networks. The HBONE protocol is now enriched with baggage headers, allowing for waypoint and ztunnel to exchange peer information transparently through east-west gateways.

Diagram showing peer metadata exchange across different networks

In the diagram above, focusing on L7 metrics, we show how the peer metadata flows through baggage headers across different clusters sitting in different networks.

The client in Cluster A initiates a request, and ztunnel starts to establish an HBONE connection through the Waypoint. This means ztunnel sends a CONNECT request with a baggage header containing the peer metadata from downstream. That metadata is then stored in the waypoint.
The baggage header containing the metadata is removed, and the request is routed normally. In this case it goes to a different cluster.
On the receiving side, the Ztunnel in Cluster B receives the HBONE request and replies with a successful status, appending a baggage header, now containing the upstream peer metadata.
The upstream peer metadata is invisible to the east-west gateway. And as the response reaches the waypoint, it will now have all the information it needs to emit metrics about the two parties involved.

Note that this functionality is behind a feature flag at the moment. If you want to try these telemetry enhancements, they need to be explicitly activated with the AMBIENT_ENABLE_BAGGAGE feature option.

Other Improvements and Fixes

Some welcomed improvements were made regarding connectivity. Ingress gateways and waypoint proxies can now route requests directly to remote clusters. This sets the stage for easier resiliency and enables more flexible design patterns providing the benefits that Istio users expect in multicluster and multi-network deployments.

And of course, we’ve also added a couple of smaller fixes making multi-network multicluster more stable and robust. We’ve updated the multicluster documentation to reflect some of these changes, including the addition of a guide on how to set up Kiali for an ambient multi-network deployment.

Limitations and Next Steps

All that said, we still acknowledge some gaps weren’t fully covered. Most of the work here was targeting multi-network support. Note that multicluster in single network deployments is still considered alpha stage.

Also, the east-west gateway may give preference to a specific endpoint during a certain time span. This may have some impact on how load from requests coming from a different network is distributed between endpoints. And this is a behavior that impacts both ambient and sidecar data plane modes, and we have plans to address it for both cases.

We’re working with the fantastic Istio community to get these limitations addressed. For now, we’re excited to get this beta out there, and eager to get your feedback. The future is looking bright for Istio multi-network multicluster.

If you would like to try out ambient multi-network multicluster, please follow this guide. Remember, this feature is in beta status and not ready for production use. We welcome your bug reports, thoughts, comments, and use cases. You can reach us on GitHub or Slack.

Announcing Istio's 2026 Steering Committee Elections

Mon, 16 Feb 2026 00:00:00 +0000

The Istio Steering Committee oversees the administrative aspects of the project, including governance, branding, marketing, and working with the CNCF.

Every year, the leaders in the Istio project estimate the proportion of the hundreds of companies that have contributed to Istio in the past year, and uses that metric to proportionally allocate nine Contribution Seats on our Steering Committee.

Then, four Community Seats are voted for by our project members, with candidates being from companies that did not receive Contribution Seats.

We are pleased to share the result of this year’s calculation, and to kick off our Community Seat election.

Contribution seats

The calculation for the 2026-2027 term reflects the deep investment of our vendors in the Istio open source project. We have four companies represented in our Contribution Seats:

Company	Seat allocation
Solo.io	4
Microsoft	3
Red Hat	1
Tetrate	1

The full allocation can be seen in our formula spreadsheet.

Community Seat election

As in previous years, the Community Seat elections immediately follow the allocation of the Contribution Seats. It is therefore now time to collect our nominations for candidates, and ensure our voter list is correct.

Candidates

Eligibility for candidacy is defined in the Steering Committee charter as a project member who does not work for a Company that will hold a Contribution Seat during the upcoming term.

We would now like to invite members from outside our Contribution Seat holders to stand for election. Nominations are due by March 6.

Voters

Eligibility to vote is defined in the charter as either:

a project member who has had at least one Pull Request merged in the past 12 months, or
someone who has submitted the voting exception form and has been accepted by the Steering Committee as having standing in the community through contribution of another kind.

The draft list of voters has been published. If you’re not on that list and you believe you have standing in the Istio community, please submit the exception form.

Exception requests are due by March 6. Voting will start on March 9 and last until March 20.

Announcement of the new committee

Upon the completion of the election, the entire 2026-2027 committee - election winners and company-selected Contribution Seat holders - will be announced.

The Steering Committee wishes to thank its members, old and new, and looks forward to continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community by contributing, standing for election, voting, and helping us shape the future of cloud native networking.

Istio at KubeCon + CloudNativeCon North America 2025: A Week of Momentum, Community, and Milestones

Tue, 25 Nov 2025 00:00:00 +0000

Istio at KubeCon NA 2025

KubeCon + CloudNativeCon North America 2025 lit up Atlanta from November 10–13, bringing together one of the largest gatherings of open-source practitioners, platform engineers, and maintainers across the cloud native ecosystem. For the Istio community, the week was defined by packed rooms, long hallway conversations, and a genuine sense of shared progress across service mesh, Gateway API, security, and AI-driven platforms.

Before the main conference began, the community kicked things off with Istio Day on November 10, a colocated event filled with deep technical sessions, migration stories, and future-looking discussions that set the tone for the rest of the week.

Istio Day at KubeCon NA

Istio Day brought together practitioners, contributors, and adopters for an afternoon of learning, sharing, and open conversations about where service mesh—and Istio—are headed next.

IstioDay: North America

Istio Day opened with Welcome + Opening Remarks from John Howard from Solo.io and Keith Mattix from Microsoft, setting the tone for an afternoon focused on real-world mesh evolution and the growing energy across the Istio community.

The day quickly moved into applied AI with Is Your Service Mesh AI Ready?, where John Howard explored how traffic management, security, and observability shape production-grade AI workloads.

IstioDay: Is Your Service Mesh AI Ready

Momentum continued with Istio Ambient Goes Multicluster as Jackie Maertens and Steven Jin Xuan from Microsoft demonstrated how Ambient Mesh behaves across distributed clusters—highlighting identity, connectivity, and operational simplifications in multi-cluster deployments.

A burst of energy came with the lightning talk Validating Your Istio Setups? The Tests Are Already Written, where Francisco Herrera Lira from Red Hat showed how built-in validation tooling can catch common configuration issues before they reach production.

In Optimizing Istio Autoscaling: From Resource-Centric to Connection-Aware, Punakshi Chaand and Pankaj Sikka shared how Intuit improved reliability by tuning autoscaling behaviors based on connection patterns rather than raw resource metrics.

Next, Running Databases in Istio’s Service Mesh with Tyler Schade and Michael Bolot from GEICO Tech challenged long-held assumptions, offering practical lessons on securing and operating stateful workloads inside a mesh.

Modernizing traffic entry points took the stage as Lin Sun from Solo.io and Ahmad Al-Masry from Harri walked through Is Zero-Downtime Migration Possible? Moving From Ingress & Sidecars to Gateway API, focusing on progressive migration strategies that avoid outages during architectural shifts.

The final session, Credit Karma’s Istio Migration: 50k+ Pods, Minimal Impact, Lessons Learned, saw Sumit Vij and Mark Gergely outline how they executed one of the largest Istio migrations to date with careful automation and rollout discipline.

The day closed with remarks from John Howard and Keith Mattix, celebrating the speakers, contributors, and a community that continues to push the boundaries of what Istio makes possible.

Istio at the Main KubeCon Conference

Outside of Istio Day, the project was highly visible across KubeCon, with maintainers, end users, and contributors sharing technical deep dives, production stories, and cutting-edge research.

This KubeCon was especially meaningful for the Istio community because Istio appeared not only across expo booths and breakout sessions, but also throughout several of the KubeCon keynotes, where companies showcased how Istio plays a critical role in powering their platforms at scale.

Istio at KubeCon Keynotes

The week’s momentum fully met its stride when the Istio community reconvened with the Istio Project Update, where project leads shared latest releases, roadmap advances, and how Istio is meeting emerging demands from AI workloads, multicluster mesh, and operational scale.

In Istio: Set Sailing With Istio Without Sidecars, attendees explored how sidecar-less Ambient Mesh architecture is rapidly moving from experiment to adoption, opening new possibilities for simpler deployments and leaner data-planes.

The session Lessons Applied Building a Next-Generation AI Proxy took the crowd behind the scenes of how mesh technologies adapt to AI-driven traffic patterns—applying the mesh not just to services, but to model-serving, inference, and data flow.

Over at Automated Rightsizing for Istio DaemonSet Workloads (Poster Session), practitioners gathered to compare strategies for optimizing control-plane resources, tuning for high scale, and reducing cost without sacrificing performance.

The narrative of traffic-management evolution featured prominently in Gateway API: Table Stakes and its faster sibling Know Before You Go! Speedrun Intro to Gateway API. These sessions brought forward foundational and introductory paths to modern ingress and mesh control.

Meanwhile, Return of the Mesh: Gateway API’s Epic Quest for Unity scaled that conversation: how traffic, API, mesh, and routing converge into one architecture that simplifies complexity rather than multiplies it.

For long-term reflection, 5 Key Lessons From 8 Years of Building Kgateway delivered hard-earned wisdom from years of system design, refactoring, and iterative improvements.

In GAMMA in Action: How Careem Migrated To Istio Without Downtime, the real-world migration story—a major production rollout that stayed up during transition—provided a roadmap for teams seeking safe mesh adoption at scale.

Safety and rollout risks took center stage in Taming Rollout Risks in Distributed Web Apps: A Location-Aware Gradual Deployment Approach, where strategies for regional rollouts, steering traffic, and minimizing user impact were laid out.

Finally, operations and day-two reality were tackled in End-to-End Security With gRPC in Kubernetes and On-Call the Easy Way With Agents, reminding everyone that mesh isn’t just about architecture, but about how teams run software safely, reliably, and confidently.

Community Spaces: ContribFest, Maintainer Track & the Project Pavilion

At the Project Pavilion, the Istio kiosk was constantly buzzing, drawing users with questions about Ambient Mesh, AI workloads, and deployment best practices.

Istio Project Pavilion

The Maintainer Track brought contributors together to collaborate on roadmap topics, triage issues, and discuss key areas of investment for the next year.

Istio Maintainers

At ContribFest, new contributors joined maintainers to work through good-first issues, discuss contribution pathways, and get their first PRs lined up.

Istio ContribFest Collaboration

Istio Maintainers Recognized at the CNCF Community Awards

This year’s CNCF Community Awards were a proud moment for the project. Two Istio maintainers received well-deserved recognition:

John Howard — Top Committer Award
Daniel Hawton — “Chop Wood, Carry Water” Award

Istio at CNCF Community Awards

Beyond these awards, Istio was also represented prominently in conference leadership. Faseela K, one of the KubeCon NA co-chairs and an Istio maintainer, participated in a keynote panel on Cloud Native for Good.

During closing remarks, it was also announced that Lin Sun, another long-time Istio maintainer, will serve as an upcoming KubeCon co-chair, highlighting the project’s strong leadership presence within CNCF.

Istio Leadership on Keynote Stage

What We Heard in Atlanta

Across sessions, kiosks, and hallways, a few themes emerged:

Ambient Mesh is shifting from exploration to real-world adoption.
AI workloads are driving innovation in mesh traffic patterns and operational practices.
Multicluster deployments are becoming commonplace, with attention to identity, control, and failover.
Gateway API is solidifying as a core tool for modern traffic management.
New contributors are joining in meaningful numbers, supported by ContribFest, hands-on guidance, and community engagement.

Looking Ahead

KubeCon NA 2025 showcased a community that is vibrant, growing, and tackling some of the hardest challenges in modern cloud infrastructure—from AI traffic management to zero-downtime migrations, from scaling planet-wide control planes to building the next generation of sidecar-less mesh.

As we look ahead to 2026, the energy from Atlanta gives us confidence: the future of service mesh is bright, and the Istio community is leading the way, together.

See you in Amsterdam

Istio at KubeCon North America 2025: Let’s Connect in Atlanta!

Fri, 07 Nov 2025 00:00:00 +0000

Get ready for a packed agenda of Istio activities at KubeCon + CloudNativeCon North America 2025, including Istio Day sessions on AI-ready service meshes, multicluster Ambient Mesh, hands-on workshops, contributor opportunities, and chances to meet maintainers and fellow community members in person.

Istio Day and Key Sessions

Join us at Istio Day on Monday, November 10 2025 — a full-day, community-focused event. Istio Day will feature sessions on AI readiness for service meshes, scaling Istio across multicluster environments with Ambient Mesh, validating and testing Istio setups, optimizing autoscaling, running stateful workloads like databases in Istio, and zero-downtime migrations to modern Gateway APIs. Attendees will also have hands-on workshops and opportunities to meet maintainers and contributors.
Catch the TOC session: Istio Project Updates: AI Inference, Ambient Multicluster & Default Deny — hear about the latest features, what the community has been working on, and a preview of the 2026 roadmap.
Participate in Istio ContribFest: From Farm (Fork) To Table (Feature): Growing Your First (Free-range Organic) Istio PR — a perfect session for newcomers and aspiring contributors to jump into the codebase and community.

Recommended Sessions at KubeCon

We gathered recommended sessions from the main conference with strong Istio relevance:

Meet Us in Person

Stop by the Istio kiosk in the Project Pavilion throughout the event to chat with maintainers, contributors, and users!

Don’t Miss Out

KubeCon + CloudNativeCon is always the perfect time to connect, learn, and celebrate the amazing work happening across the Istio community. Stay tuned for more updates, demos, and announcements — including exciting conversations around AI inference, multicluster networking, and the evolution of Ambient Mesh.

Istio Project Announces 2025 Technical Oversight Committee Election Results

Tue, 19 Aug 2025 00:00:00 +0000

Last year we announced that Istio would transform from an indefinitely-appointed Technical Oversight Committee to a regularly elected body, with members serving two-year terms.

Each year, three of the six seats are elected. To bootstrap the process, we announced the 2025 election would cover the seats held by the three longest-serving members.

One of those three seats became vacant, prompting a by-election. Long-time maintainer Costin Manolache won that election. We thank Costin for his continuing contributions and completing the remainder of the term, after which he chose not to stand again.

Five candidates stood for the three available seats, and the Steering Committee has now concluded the election.

Lin Sun and Louis Ryan were re-elected to their two seats. Both have been involved with Istio since before its public launch, and continue to serve as active leaders across the project.

The third seat was won by Rama Chavali, a long-time contributor and maintainer of Istio through his work at Salesforce.

In Rama’s own words:

I have worked with service mesh technologies for more than eight years, contributing to various projects that built the Managed Mesh platform for Salesforce. Istio and Envoy are the backbone of the Salesforce service mesh platform, powering critical workloads. The majority of Salesforce traffic flows through Istio and Envoy.

I have been an active and impactful contributor to the Istio project since April 2019. My contributions span both the core control plane components of Istio and Envoy, the high-performance proxy that serves as Istio’s data plane. Demonstrating a deep technical understanding and commitment to the project, I was recognized as a maintainer of Istio’s control plane in January 2020. Building on this expertise in service mesh networking, I became a Networking Working Group lead in July 2020.

Throughout my tenure, I have been instrumental in shaping the Istio project through the development and implementation of numerous significant features and architectural designs.

On behalf of the Istio community, I congratulate Rama on his election to the TOC and look forward to his continued leadership and impact.

Introducing multicluster support for ambient mode (alpha)

Mon, 04 Aug 2025 00:00:00 +0000

Multicluster has been one of the most requested features of ambient -— and as of Istio 1.27, it is available in alpha status! We sought to capture the benefits and avoid the complications of multicluster architectures while using the same modular design that ambient users love. This release brings the core functionality of a multicluster mesh and lays the groundwork for a richer feature set in upcoming releases.

The Power & Complexity of Multicluster

Multicluster architectures increase outage resilience, shrink your blast radius, and scale across data centers. That said, integrating multiple clusters poses connectivity, security, and operational challenges.

In a single Kubernetes cluster, every pod can directly connect to another pod via a unique pod IP or service VIP. These guarantees break down in multicluster architectures; IP address spaces of different clusters might overlap, and even without overlap, the underlying infrastructure would need configuration to route cross-cluster traffic.

Cross-cluster connectivity also presents security challenges. Pod-to-pod traffic will leave cluster boundaries and pods will accept connections from outside their cluster. Without identity verification at the edge of the cluster and strong encryption, an outside attacker could exploit a vulnerable pod or intercept unencrypted traffic.

A multicluster solution must securely connect clusters and do so through simple, declarative APIs that keep pace with dynamic environments where clusters are frequently added and removed.

Key Components

Ambient multicluster extends ambient with new components and minimal APIs to securely connect clusters using ambient’s lightweight, modular architecture. It builds on the namespace sameness model so services keep their existing DNS names across clusters, allowing you to control cross-cluster communication without changing application code.

East-West Gateways

Each cluster has an east-west gateway with a globally routable IP acting as an entry point for cross-cluster communication. A ztunnel connects to the remote cluster’s east-west gateway, identifying the destination service by its namespaced name. The east-west gateway then load balances the connection to a local pod. Using the east-west gateway’s routable IP removes the need for inter-cluster routing configuration, and addressing pods by namespaced name rather than IP eliminates issues with overlapping IP spaces. Together, these design choices enable cross-cluster connectivity without changing cluster networking or restarting workloads, even as clusters are added or removed.

Double HBONE

Ambient multicluster uses nested HBONE connections to efficiently secure traffic traversing cluster boundaries. An outer HBONE connection encrypts traffic to the east-west gateway and allows the source ztunnel and east-west gateway to verify each other’s identity. An inner HBONE connection encrypts traffic end-to-end, which allows the source ztunnel and destination ztunnel to verify each other’s identity. At the same time, the HBONE layers allow ztunnel to effectively reuse cross-cluster connections, minimizing TLS handshakes.

Istio ambient multicluster traffic flow

Service Discovery and Scope

Marking a service global enables cross-cluster communication. Istiod configures east-west gateways to accept and route global service traffic to local pods and programs ztunnels to load balance global service traffic to remote clusters.

Mesh administrators define the label-based criteria for global services via the ServiceScope API, and app developers label their services accordingly. The default ServiceScope is

serviceScopeConfigs:
  - servicesSelector:
      matchExpressions:
        - key: istio.io/global
          operator: In
          values: ["true"]
    scope: GLOBAL

meaning that any service with the istio.io/global=true label is global. Although the default value is straightforward, the ServiceScope API can express complex conditions using a mix of ANDs and ORs.

By default, ztunnel load balances traffic uniformly across all endpoints –even remote ones–, but this is configurable through the service’s trafficDistribution field to only cross cluster boundaries when there are no local endpoints. Thus, users have control over whether and when traffic crosses cluster boundaries with no changes to application code.

Limitations and Roadmap

Although the current implementation of ambient multicluster has the foundational features for a multicluster solution, there is still a lot of work to be done. We are looking to improve the following areas

Service and waypoint configuration must be uniform across all clusters.
No cross-cluster L7 failover (L7 policy is applied at the destination cluster).
No support for direct pod addressing or headless services.
Support only for multi-primary deployment model.
Support only for one network per cluster deployment model.

We are also looking to improve our reference documentation, guides, testing, and performance.

If you would like to try out ambient multicluster, please follow this guide. Remember, this feature is in alpha status and not ready for production use. We welcome your bug reports, thoughts, comments, and use cases – you can reach us on GitHub or Slack.

Bringing AI-Aware Traffic Management to Istio: Gateway API Inference Extension Support

Mon, 28 Jul 2025 00:00:00 +0000

The world of AI inference on Kubernetes presents unique challenges that traditional traffic-routing architectures weren’t designed to handle. While Istio has long excelled at managing microservice traffic with sophisticated load balancing, security, and observability features, the demands of Large Language Model (LLM) workloads require specialized functionality.

That’s why we’re excited to announce Istio’s support for the Gateway API Inference Extension, bringing intelligent, model-aware and LoRA-aware routing to Istio.

Why AI Workloads Need Special Treatment

Traditional web services typically handle quick, stateless requests measured in milliseconds. AI inference workloads operate in a completely different paradigm that challenges conventional load balancing approaches in several fundamental ways.

The Scale and Duration Challenge

Unlike typical API responses that complete in milliseconds, AI inference requests often take significantly longer to process - sometimes several seconds or even minutes. This dramatic difference in processing time means that routing decisions have far more impact than in traditional web services. A single poorly-routed request can tie up expensive GPU resources for extended periods, creating cascading effects across the entire system.

The payload characteristics are equally challenging. AI inference requests frequently involve substantially larger payloads, especially when dealing with Retrieval-Augmented Generation (RAG) systems, multi-turn conversations with extensive context, or multi-modal inputs including images, audio, or video. These large payloads require different buffering, streaming, and timeout strategies than traditional HTTP APIs.

Resource Consumption Patterns

Perhaps most critically, a single inference request can consume an entire GPU’s resources during processing. This is fundamentally different from traditional request serving where multiple requests can be processed concurrently on the same compute resources. When a GPU is fully engaged with one request, additional requests must queue, making the scheduling and routing decision far more impactful than those for standard API workloads.

This resource exclusivity means that simple round-robin or least-connection algorithms can create severe imbalances. Sending requests to a server that’s already processing a complex inference task doesn’t just add latency, it can cause resource contention that impacts performance for all queued requests.

Stateful Considerations and Memory Management

AI models often maintain in-memory caches that significantly impact performance. KV caches store intermediate attention calculations for previously processed tokens, serving as the primary consumer of GPU memory during generation and often becoming the most common bottleneck. When KV cache utilization approaches limits, performance degrades dramatically, making cache-aware routing essential.

Additionally, many modern AI deployments use fine-tuned adapters like LoRA (Low-Rank Adaptation) to customize model behavior for specific users, organizations, or use cases. These adapters consume GPU memory and loading time when switched. A model server that already has the required LoRA adapter loaded can process requests immediately, while servers without the adapter face expensive loading overhead that can take seconds to complete.

Queue Dynamics and Criticality

AI inference workloads also introduce the concept of request criticality that’s less common in traditional services. Real-time interactive applications (like chatbots or live content generation) require low latency and should be prioritized, while batch processing jobs or experimental workloads can tolerate higher latency or even be dropped during system overload.

Traditional load balancers lack the context to make these criticality-based decisions. They can’t distinguish between a time-sensitive customer support query and a background batch job, leading to suboptimal resource allocation during peak demand periods.

This is where inference-aware routing becomes critical. Instead of treating all backends as equivalent black boxes, we need routing decisions that understand the current state and capabilities of each model server, including their queue depth, memory utilization, loaded adapters, and ability to handle requests of different criticality levels.

Gateway API Inference Extension: A Kubernetes-Native Solution

The Kubernetes Gateway API Inference Extension has introduced solutions to these challenges, building on the proven foundation of Kubernetes Gateway API while adding AI-specific intelligence. Rather than requiring organizations to patch together custom solutions or abandon their existing Kubernetes infrastructure, the extension provides a standardized, vendor-neutral approach to intelligent AI traffic management.

The extension introduces two key Custom Resource Definitions that work together to address the routing challenges we’ve outlined. The InferenceModel resource provides an abstraction for AI-Inference workload owners to define logical model endpoints, while the InferencePool resource gives platform operators the tools to manage backend infrastructure with AI workload awareness.

By extending the familiar Gateway API model rather than creating an entirely new paradigm, the inference extension enables organizations to leverage their existing Kubernetes expertise while gaining the specialized capabilities that AI workloads demand. This approach ensures that teams can adopt intelligent inference routing aligned with familiar networking knowledge and tooling.

Note: InferenceModel is likely to change in future Gateway API Inference Extension releases.

InferenceModel

The InferenceModel resource allows inference workload owners to define logical model endpoints that abstract the complexities of backend deployment.

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
  name: customer-support-bot
  namespace: ai-workloads
spec:
  modelName: customer-support
  criticality: Critical
  poolRef:
    name: llama-pool
  targetModels:
    - name: llama-3-8b-customer-v1
      weight: 80
    - name: llama-3-8b-customer-v2
      weight: 20

This configuration exposes a customer-support model that intelligently routes between two backend variants, enabling safe rollouts of new model versions while maintaining service availability.

InferencePool

The InferencePool acts as a specialized backend service that understands AI workload characteristics:

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
  name: llama-pool
  namespace: ai-workloads
spec:
  targetPortNumber: 8000
  selector:
    app: llama-server
    version: v1
  extensionRef:
    name: llama-endpoint-picker

When integrated with Istio, this pool automatically discovers model servers through Istio’s service discovery.

How Inference Routing Works in Istio

Istio’s implementation builds on the service mesh’s proven traffic management foundation. When a request enters the mesh through a Kubernetes Gateway, it follows the standard Gateway API HTTPRoute matching rules. However, instead of using traditional load balancing algorithms, the backend is picked by an Endpoint Picker (EPP) service.

The EPP evaluates multiple factors to select the optimal backend:

Request Criticality Assessment: Critical requests receive priority routing to available servers, while lower criticality requests (Standard or Sheddable) may be load-shed during high utilization periods.
Resource Utilization Analysis: The extension monitors GPU memory usage, particularly KV cache utilization, to avoid overwhelming servers that are approaching capacity limits.
Adapter Affinity: For models using LoRA adapters, requests are preferentially routed to servers that already have the required adapter loaded, eliminating expensive loading overhead.
Prefix-Cache Aware Load Balancing: Routing decisions consider distributed KV cache states across model servers, and prioritize model servers that already have the prefix in their cache.
Queue Depth Optimization: By tracking request queue lengths across backends, the system avoids creating hotspots that would increase overall latency.

This intelligent routing operates transparently within Istio’s existing architecture, maintaining compatibility with features like mutual TLS, access policies, and distributed tracing.

Inference Routing Request Flow

The Road Ahead

The future roadmap includes istio-related features such as:

Support for Waypoints - As Istio continues to evolve toward ambient mesh architecture, inference-aware routing will be integrated into waypoint proxies to provide centralized, scalable policy enforcement for AI workloads.

Beyond Istio-specific innovations, the Gateway API Inference Extension community is also actively developing several advanced capabilities that will further enhance routing for AI inference workloads on Kubernetes:

HPA Integration for AI Metrics: Horizontal Pod Autoscaling based on model-specific metrics rather than just CPU and memory.
Multi-Modal Input Support: Optimized routing for large multi-modal inputs and outputs (images, audio, video) with intelligent buffering and streaming capabilities.
Heterogeneous Accelerator Support: Intelligent routing across different accelerator types (GPUs, TPUs, specialized AI chips) with latency and cost-aware load balancing.

Getting Started with Istio Inference Extension

Ready to try inference-aware routing? The implementation is officially available starting with Istio 1.27!

For installation and guides, please follow the Istio-specific guidance on the Gateway API Inference Extension website.

Performance Impact and Benefits

Early evaluations show significant performance improvements with inference-aware routing, including substantially lower p90 latency at higher query rates and reduced end-to-end tail latencies compared to traditional load balancing.

For detailed benchmark results and methodology, see the Gateway API Inference Extension performance evaluation with testing data using H100 GPUs and vLLM deployments.

The integration with Istio’s existing infrastructure means these benefits come with minimal operational overhead, and your existing monitoring, security, and traffic management configurations continue to work unchanged.

Conclusion

The Gateway API Inference Extension represents a significant step forward in making Kubernetes truly AI-ready, and Istio’s implementation brings this intelligence to the service mesh layer where it can have maximum impact. By combining inference-aware routing with Istio’s proven security, observability, and traffic management capabilities, we’re enabling organizations to run AI workloads with the same operational excellence they expect from their traditional services.

Have a question or want to get involved? Join the Kubernetes Slack and then find us on the #gateway-api-inference-extension channel or discuss on the Istio Slack.

Istio Roadmap for 2025-2026

Fri, 25 Jul 2025 00:00:00 +0000

Over the next 12 months, we will focus on improving parity between sidecar mode and ambient mode, providing a supported path for sidecar users to migrate to the ambient data plane when they are ready. We will also revamp our contributor experience, simplifying the process for proposing and implementing new features, and giving recognition to our most valuable contributors. We plan to grow our ecosystem by adding or updating Istio’s integration to various popular cloud native projects and build more case studies for Istio.

Looking Back

Since 2023, the Istio project has been focused on maturity and innovation, solidifying our position as the best service mesh regardless of sidecars or ambient. These efforts included our CNCF graduation in July 2023, the promotion of Telemetry API and Gateway API to Stable in Istio 1.22, and the promotion of ambient mode to Stable in Istio 1.24. As part of Istio ambient mode reaching GA, we have observed more and more users exploring and adopting it, some of the users are net new Istio users, while others are users of Istio sidecars. Some of them ran ambient in production and spoke about their experiences at KubeCon EU in April this year. These efforts have made Istio the service mesh of choice for cloud native developers around the world, and we have been excited to accept first code contributions from 154 people in the past 12 months.

2025 Themes

Sidecar to ambient migration

With the promotion of ambient mode to Stable, Istio can now lay claim to being the fastest and most efficient service mesh as well as the most widely used, while being easier to operate than ever. With graduation, we’ve seen a substantial increase in interest, and a corresponding number of requests for a comprehensive migration guide for existing sidecar users. While our previous efforts to stabilize ambient mode have been targeted at new Istio users, it is clear that the time has come to provide an onramp for our existing user base to migrate to ambient mesh. While the technical foundations for this migration have been in place for some time (and some brave users have migrated on their own), we will be making new investments in tooling to assess your readiness to migrate, rollback-safe interoperability, and documentation to guide users every step of the way.

In addition to tests, tooling, and documentation, users migrating between data planes should reasonably expect that the Istio features they know and love will continue to work in their new environment. For this reason, we are investing in closing the most significant functionality gaps between sidecar and ambient mode, specifically by adding support for multi-cluster traffic management and extensibility, which you can read about below.

As we have stated in previous years, we have no intention of ending support for sidecar mode as long as there are users for it. Migrating to ambient mesh is completely voluntary, and we expect many users will use sidecars for years to come.

Multi-cluster ambient mesh

Multi-cluster traffic management has long been one of the most valued enterprise features of Istio, and we are hard at work to bring this value to ambient mode users in 2025. With a multi-cluster mesh, service outages or anomalies in one cluster can dynamically cause requests to fail over to other clusters, potentially in other regions or clouds. This gives users the ability to run high-availability services in active-active configuration, optimizing compute utilization and traffic costs from a single control plane. Multi-cluster ambient mesh will be available as an Alpha in Istio 1.27, which we plan to release in August.

The future of extensibility

The Istio project has offered several APIs for extensibility since launch, and none of them has been able to mature to Stable. Of those in use today, Envoy Filters are a powerful tool for tweaking internal proxy configuration, and modifying traffic flow, but are very difficult to use, and pose significant risk during upgrades, which can change the filter integrations in ways that cannot always be predicted. WebAssembly (Wasm) emerged in 2019 as a powerful tool for Turing-complete modification of traffic, but community support for Wasm compilers and libraries outside the Istio ecosystem has waned substantially since that time, making it difficult for users to safely and securely use Wasm with Istio.

As we plan for 2025 and beyond, it is clear that we need a path to a mature extensibility model for users of sidecars and ambient mode alike. We plan to address the most common use cases for extensibility, such as local rate limiting, with first class APIs, reducing the frequency with which users require extensibility. However, we recognize that networks are complex, and there will always be cases our APIs don’t cover, when users need a “break glass” option. The architecture of ambient mode provides some options, such as leveraging the waypoint pattern to accomplish service insertion, adding arbitrary proxies to the network chain, which can then perform arbitrary modifications. Another similar development is Envoy’s ext-proc filter, which sends requests to an arbitrary service for modification before forwarding them to their destination.

With several options on the table for extensibility, who will decide which is best? As always, the final decision lies with you, our users. Please share your thoughts with us about the future of the project in the extensibility channel at slack.istio.io.

New and Improved Contributor Experience

The Istio community is full of many talented contributors whose daily efforts make this project possible, and the list of contributors is always growing! However, like all Open Source projects, we are always in need of new contributors, and we recognize that submitting your first PR to Istio is harder than it should be. In 2025, we aim to make authoring your first Istio contribution easier than ever with improved integration with GitHub Codespaces, and regular triage of good first issues! If you’re interested in contributing, we can always use help on Issues labeled User Experience and Documentation. If you’d like to get more involved, consider joining our release manager rotation, which will provide you with two releases as a shadow before taking on primary release management responsibilities. We will also aim to provide better recognition to our contributors through a revamped workgroup leads program, where top contributors can be recognized for their expertise! With these initiatives, we believe we are setting up the Istio community to grow for years to come.

Conclusion

This roadmap outlines an exciting near-term for Istio, focusing on a seamless migration path from sidecar to ambient mode, enhanced multi-cluster capabilities, and a refined approach to extensibility. We are also committed to fostering a more welcoming and rewarding environment for our invaluable contributors. These initiatives solidify Istio’s position as the leading service mesh, ready to empower cloud native developers with unmatched efficiency, control, and a thriving community.

Istio at KubeCon Europe 2025

Fri, 25 Apr 2025 00:00:00 +0000

The open source and cloud native community gathered from the 1st to 4th of April in London for the first KubeCon of 2025. The four-day conference, organized by the Cloud Native Computing Foundation, was “big” for Istio, as our presence was seen almost everywhere - from the keynotes to the project pavilion.

We kick-started the activities in London with Istio Day - a KubeCon + CloudNativeCon co-located event on April 1st. The event was well-received, showcasing lessons learned from running Istio in production, hands-on experiences, and featuring maintainers from across the Istio ecosystem.

Istio Day Europe 2025, Welcome

Istio Day kicked off with an opening keynote from the Program Committee chairs, Keith Mattix and Denis Jannot. The keynote was followed by the much-awaited talk from Microsoft about Istio Ambient Mesh support on Windows. We had a very interesting talk by Lior Lieberman from Google and Erik Parienty from Riskified on architecting Istio for large scale deployments, followed by a talk from Kiali maintainers Josune Cordoba and Hayk Hovsepyan, from RedHat, about troubleshooting Istio ambient mesh with Kiali 2.0.

Istio Day Europe 2025, Kiali session

Istio multi-cluster is always a hot topic, and Pamela Hernandez from BlackRock nailed it in the talk on navigating the maze of multi-cluster Istio, diving into the complexities of implementing a multi-cluster Istio service mesh at scale, covering a hub-and-spoke model. The audience was excited when Denis Jannot from Solo.io ran a live, representative benchmark at scale with Istio Ambient, debunking all myths about service mesh overhead and complexity. The event witnessed how Istio played a pivotal role in managing traffic and ensuring data security, ultimately enabling a secure and efficient AI platform that meets enterprise standards when SAP presented GenAI platform challenges in multi-tenant environments. Rounding out the talks was a lightning talk by Rob Salmond from SuperOrbit on How to get Istio help, which involved the best places to go, how to ask good questions, and avoid common missteps.

Istio Day Europe 2025, Jam packed sessions

The slides for all the sessions can be found in the Istio Day EU 2025 schedule.

Our presence at the conference did not end with Istio Day. The first day keynote of KubeCon + CloudNativeCon started with an Istio project lightning talk from Mitch Connors.

Istio Day Europe 2025, Project lightning talk

There were several keynotes on the main stage where Istio was mentioned. At the opening day keynotes, Vasu Chandrasekhara, from SAP, announced the NeoNephos Foundation under the Linux Foundation Europe - a major step forward for Digital Sovereignty in Europe, and Istio was mentioned as a supported project.

KubeCon Europe 2025, Announcing NeoNephos

Stephen Connolly shared HSBC’s journey with Kubernetes and also discussed plans to adopt Istio ambient mesh to save on costs. Ant Group, who won the CNCF End User Award, also highlighted their Istio usage. Idit Levine and Keith Babo, from Solo.io, announced a free cost-saving estimator and migration tool for Istio ambient mesh. Faseela K had a Telco end user panel keynote on Cloud Native Evolution in Telecom with Vodafone, Orange, and Swisscom, which again highlighted Istio usage for Telco Network Functions.

KubeCon Europe 2025, Cloud Native evolution in Telecom

Istio’s maintainer track session was also well received, where Raymond Wong, from Forbes, joined maintainers Louis Ryan and Lin Sun to discuss about Forbe’s journey to Istio ambient in production. It was a packed room with a lot of questions afterwards.

KubeCon Europe 2025, Istio maintainer track session

A Contribfest session led by Mitch Connors (Microsoft), Daniel Hawton (Solo.io), and Jackie Maertens (Microsoft) walked through the structure of the Istio repositories, where each component’s code lives, finding issues to resolve, setting up and using integration tests, and making first contributions to the project as well as resources for getting development environments up and running and places to go to get assistance.

KubeCon Europe 2025, Istio contrib fest session

Istio maintainers Lin Sun and Faseela K had a book signing event post their Istio Phippy book reading session on “Izzy saves the Birthday”.

KubeCon Europe 2025, Izzy saves the birthday, book signing

The following sessions at KubeCon were based on Istio and almost all of them had a huge crowd in attendance:

Istio had a kiosk in the project pavilion, with the majority of questions asked being around extensibility and multi cluster enhancements. Many of our members and maintainers offered support at our kiosk, helping us answer all the questions from our users.

KubeCon Europe 2025, Istio Kiosk

Many of our TOC members and maintainers also offered support at the booth, where a lot of interesting discussions happened around Istio ambient mesh as well.

KubeCon Europe 2025, More support at Istio Kiosk

We would like to express our heartfelt gratitude to our gold sponsor Microsoft Azure, for supporting Istio Day Europe! Last but not least, we would like to thank our Istio Day Program Committee members, for all their hard work and support!

See you in Atlanta in November 2025!

Istio publishes results of ztunnel security audit

Fri, 18 Apr 2025 00:00:00 +0000

Istio’s ambient mode splits the service mesh into two distinct layers: Layer 7 processing (the “waypoint proxy”), which remains powered by the traditional Envoy proxy; and a secure overlay (the “zero-trust tunnel” or “ztunnel”), which is a new codebase, written from the ground up in Rust.

It is our intention that the ztunnel project be safe to install by default in every Kubernetes cluster, and to that end, it needs to be secure and performant.

We comprehensively demonstrated ztunnel’s performance, showing that it is the highest-bandwidth way to achieve a secure zero-trust network in Kubernetes — providing higher TCP throughput than even in-kernel data planes like IPsec and WireGuard — and that its performance has increased by 75% over the past 4 releases.

Today, we are excited to validate the security of ztunnel, publishing the results of an audit of the codebase performed by Trail of Bits.

We would like to thank the Cloud Native Computing Foundation for funding this work, and OSTIF for its coordination.

Scope and overall findings

Istio has been assessed in 2020 and 2023, with the Envoy proxy receiving independent assessment. The scope of this review was the new code in Istio’s ambient mode, the ztunnel component: specifically code relating to L4 authorization, inbound request proxying, transport-layer security (TLS), and certificate management.

The auditors stated that “the ztunnel codebase is well-written and structured”, and had no findings relating to vulnerabilities in the code. Their three findings — one of medium severity and two of informational — refer to recommendations regarding external factors, including software supply chain and testing.

Resolution and suggested improvements

Improving dependency management

At the time of the audit, the cargo audit report for ztunnel’s dependencies showed three versions with current security advisories. There was no suggestion that any vulnerable code paths in ztunnel dependencies could be reached, and the maintainers would regularly update the dependencies to the latest appropriate versions. To streamline this, we’ve adopted GitHub’s Dependabot for automated updates.

The auditors pointed out the risk of Rust crates in the dependency chain of ztunnel that either unmaintained or maintained by a single owner. This is a common situation in the Rust ecosystem (and indeed all of open source). We replaced the two crates that were explicitly identified.

Enhancing test coverage

The Trail of Bits team found that most ztunnel functionality is well-tested, but identified some error-handling code paths which were not covered by mutation testing.

We evaluated the suggestions and found that the gaps in coverage highlighted by these results apply to test code, and to code that does not affect correctness.

While mutation testing is useful to identify potential areas to improve, the goal is not to get to a point where a report returns no results. Mutations can trigger no test failures in a number of expected cases, such as behavior with no ‘correct’ result (e.g., log messages), behavior that impacts only performance but not correctness (measured outside of the scope the tooling is aware of), code paths that have multiple ways to achieve the same result, or code used only for testing. Testing and security is a core priority for the Istio team and we are constantly improving our test coverage — using tools like mutation testing and by developing novel solutions to test proxies.

Hardening HTTP header parsing

A third-party library was used for parsing the value of the HTTP Forwarded header, which may be present on connections made to ztunnel. The auditors pointed out that header parsing is a common area of attack, and expressed concern that the library we used was not fuzz tested. Given that we were only using this library for parsing one header, we wrote a custom parser for the Forwarded header, complete with a fuzzing harness to test it.

Get involved

With strong performance and now validated security, ambient mode continues to advance the state of the art in service mesh design. We encourage you to try it out today.

If you would like to get involved with Istio product security, or become a maintainer, we’d love to have you! Join our Slack workspace or our public meetings to raise issues or learn about what we are doing to keep Istio secure.

Sail Operator 1.0.0 released: manage Istio with an operator

Thu, 03 Apr 2025 00:00:00 +0000

The Sail Operator is a community project launched by Red Hat to build a modern operator for Istio. First announced in August 2024, we are pleased to announce Sail Operator is now GA with a clear mission: to simplify and streamline Istio management in your cluster.

Simplified deployment & management

The Sail Operator is engineered to cut down the complexity of installing and running Istio. It automates manual tasks, ensuring a consistent, reliable, and uncomplicated experience from initial installation to ongoing maintenance and upgrades of Istio versions in your cluster. The Sail Operator APIs are built around Istio’s Helm chart APIs, which means that all the Istio configurations are available through the Sail Operator CRD’s values.

We encourage users to go through our documentation to learn more about this new way to manage your Istio environment.

The main resources that are part of the Sail Operator are:

Istio: manages an Istio control plane.
IstioRevision: represents a revision of the control plane.
IstioRevisionTag: represents a stable revision tag, which functions as an alias for an Istio control plane revision.
IstioCNI: manages Istio’s CNI node agent.
ZTunnel: manage the ambient mode ztunnel DaemonSet (Alpha feature).

Main features and support

Each component of the Istio control plane is managed independently by the Sail Operator through dedicated Kubernetes Custom Resources (CRs). The Sail Operator provides separate CRDs for components such as Istio, IstioCNI, and ZTunnel, allowing you to configure, manage, and upgrade them individually. Additionally, there are CRDs for IstioRevision and IstioRevisionTag to manage Istio control plane revisions.
Support for multiple Istio versions. Currently the 1.0.0 version supports: 1.24.3, 1.24.2, 1.24.1, 1.23.5, 1.23.4, 1.23.3, 1.23.0.
Two update strategies are supported: InPlace and RevisionBased. Check our documentation for more information about the update types supported.
Support for multicluster Istio deployment models: multi-primary, primary-remote, external control plane. More information and examples in our documentation.
Ambient mode support is Alpha: check our specific documentation.
Addons are managed separately from the Sail Operator. They can be easily integrated with the Sail Operator, check this section for the documentation for examples and more information.

Why now?

As cloud native architectures continue to evolve, we feel a robust and user-friendly operator for Istio is more essential than ever. The Sail Operator offers developers and operations teams a consistent, secure, and efficient solution that feels familiar to those used to working with operators. Its GA release signals a mature solution, ready to support even the most demanding production environments.

Try it out

Would you like to try out Sail Operator? This example will show you how to safely do an update of your Istio control plane by using the revision-based upgrade strategy. This means you will have two Istio control planes running at the same time, allowing you to migrate workloads easily, minimizing the risk of traffic disruptions.

Prerequisites:

Running cluster
Helm
Kubectl
Istioctl

Install the Sail Operator using Helm

$ helm repo add sail-operator https://istio-ecosystem.github.io/sail-operator
$ helm repo update
$ kubectl create namespace sail-operator
$ helm install sail-operator sail-operator/sail-operator --version 1.0.0 -n sail-operator

The operator is now installed in your cluster:

NAME: sail-operator
LAST DEPLOYED: Tue Mar 18 12:00:46 2025
NAMESPACE: sail-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None

Check the operator pod is running:

$ kubectl get pods -n sail-operator
NAME                             READY   STATUS    RESTARTS   AGE
sail-operator-56bf994f49-j67ft   1/1     Running   0          87s

Create `Istio` and `IstioRevisionTag` resources

Create an Istio resource with the version v1.24.2 and an IstioRevisionTag:

$ kubectl create ns istio-system
$ cat <


Note that the IstioRevisionTag has a target reference to the Istio resource with the name default
Check the state of the resources created:


istiod pods are running
$ kubectl get pods -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istiod-default-v1-24-2-bd8458c4-jl8zm   1/1     Running   0          3m45s


Istio resource created
$ kubectl get istio
NAME      REVISIONS   READY   IN USE   ACTIVE REVISION   STATUS    VERSION   AGE
default   1           1       1        default-v1-24-2   Healthy   v1.24.2   4m27s


IstioRevisionTag resource created
$ kubectl get istiorevisiontag
NAME      STATUS                    IN USE   REVISION          AGE
default   NotReferencedByAnything   False    default-v1-24-2   4m43s


Note that the IstioRevisionTag status is NotReferencedByAnything. This is because there are currently no resources using the revision default-v1-24-2.
Deploy sample application
Create a namespace and label it to enable Istio injection:
$ kubectl create namespace sample
$ kubectl label namespace sample istio-injection=enabled
After labeling the namespace you will see that the IstioRevisionTag resource status will change to ‘In Use: True’, because there is now a resource using the revision default-v1-24-2:
$ kubectl get istiorevisiontag
NAME      STATUS    IN USE   REVISION          AGE
default   Healthy   True     default-v1-24-2   6m24s
Deploy the sample application:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/sleep/sleep.yaml -n sample
Confirm the proxy version of the sample app matches the control plane version:
$ istioctl proxy-status
NAME                              CLUSTER        CDS              LDS              EDS              RDS              ECDS        ISTIOD                                    VERSION
sleep-5fcd8fd6c8-q4c9x.sample     Kubernetes     SYNCED (78s)     SYNCED (78s)     SYNCED (78s)     SYNCED (78s)     IGNORED     istiod-default-v1-24-2-bd8458c4-jl8zm     1.24.2
Upgrade the Istio control plane to version 1.24.3
Update the Istio resource with the new version:
$ kubectl patch istio default -n istio-system --type='merge' -p '{"spec":{"version":"v1.24.3"}}'
Check the Istio resource. You will see that there are two revisions and they are both ‘ready’:
$ kubectl get istio
NAME      REVISIONS   READY   IN USE   ACTIVE REVISION   STATUS    VERSION   AGE
default   2           2       2        default-v1-24-3   Healthy   v1.24.3   10m
The IstioRevisiontag now references the new revision:
$ kubectl get istiorevisiontag
NAME      STATUS    IN USE   REVISION          AGE
default   Healthy   True     default-v1-24-3   11m
There are two IstioRevisions, one for each Istio version:
$ kubectl get istiorevision
NAME              TYPE   READY   STATUS    IN USE   VERSION   AGE
default-v1-24-2          True    Healthy   True     v1.24.2   11m
default-v1-24-3          True    Healthy   True     v1.24.3   92s
The Sail Operator automatically detects whether a given Istio control plane is being used and writes this information in the “In Use” status condition that you see above. Right now, all IstioRevisions and our IstioRevisionTag are considered “In Use”:

The old revision default-v1-24-2 is considered in use because it is referenced by the sample application’s sidecar.
The new revision default-v1-24-3 is considered in use because it is referenced by the tag.
The tag is considered in use because it is referenced by the sample namespace.

Confirm there are two control plane pods running, one for each revision:
$ kubectl get pods -n istio-system
NAME                                      READY   STATUS    RESTARTS   AGE
istiod-default-v1-24-2-bd8458c4-jl8zm     1/1     Running   0          16m
istiod-default-v1-24-3-68df97dfbb-v7ndm   1/1     Running   0          6m32s
Confirm the proxy sidecar version remains the same:
$ istioctl proxy-status
NAME                              CLUSTER        CDS                LDS                EDS                RDS                ECDS        ISTIOD                                    VERSION
sleep-5fcd8fd6c8-q4c9x.sample     Kubernetes     SYNCED (6m40s)     SYNCED (6m40s)     SYNCED (6m40s)     SYNCED (6m40s)     IGNORED     istiod-default-v1-24-2-bd8458c4-jl8zm     1.24.2
Restart the sample pod:
$ kubectl rollout restart deployment -n sample
Confirm the proxy sidecar version is updated:
$ istioctl proxy-status
NAME                              CLUSTER        CDS              LDS              EDS              RDS              ECDS        ISTIOD                                      VERSION
sleep-6f87fcf556-k9nh9.sample     Kubernetes     SYNCED (29s)     SYNCED (29s)     SYNCED (29s)     SYNCED (29s)     IGNORED     istiod-default-v1-24-3-68df97dfbb-v7ndm     1.24.3
When an IstioRevision is no longer in use and is not the active revision of an Istio resource (for example, when it is not the version that is set in the spec.version field), the Sail Operator will delete it after a grace period, which defaults to 30 seconds. Confirm the deletion of the old control plane and IstioRevision:


The old control plane pod is deleted
$ kubectl get pods -n istio-system
NAME                                      READY   STATUS    RESTARTS   AGE
istiod-default-v1-24-3-68df97dfbb-v7ndm   1/1     Running   0          10m


The old IstioRevision is deleted
$ kubectl get istiorevision
NAME              TYPE   READY   STATUS    IN USE   VERSION   AGE
default-v1-24-3          True    Healthy   True     v1.24.3   13m


The Istio resource now only has one revision
$ kubectl get istio
NAME      REVISIONS   READY   IN USE   ACTIVE REVISION   STATUS    VERSION   AGE
default   1           1       1        default-v1-24-3   Healthy   v1.24.3   24m


Congratulations! You have successfully updated your Istio control plane using the revision-based upgrade strategy.

    
        
            
        
        To check the latest Sail Operator version, visit our releases page.  As this example may evolve over time, please refer to our documentation to ensure you’re reading the most up-to-date version.
    


Conclusion
The Sail Operator automates manual tasks, ensuring a consistent, reliable, and uncomplicated experience from initial installation to ongoing maintenance and upgrades of Istio in your cluster. The Sail Operator is an istio-ecosystem project, and we encourage you to try it out and provide feedback to help us improve it, you can check our contribution guide for more information about how to contribute to the project.



Istio at KubeCon Europe, See you soon in London!
Tue, 25 Mar 2025 00:00:00 +0000

    
        
            
        
    
    

An amazing lineup of Istio activities awaits you in London at KubeCon + CloudNativeCon Europe 2025!


Join for the Istio Project Meeting hosted at the Maintainer Summit.

    
        
            
        
    
    



Come to the Istio Day co-located event.

    
        
            
        
    
    



Attend the Istio Maintainers’ Track session: Istio: The Past, Present and Future of the Project and Community


Drop by the Istio Contribfest session: A Beginner’s Guide to Contributing to Istio - Hands-on Development and Contribution Workshop


Add the following KubeCon sessions to your schedule, all of which have an Istio flavor:

Project Lightning Talk: What’s New in Istio?
Sponsored Demo: Bringing Agentic AI to Cloud Native - Introducing kagent
“Izzy Saves the Birthday” - A Story-Driven Live Demo Exploring the Magic of Service Mesh
Trino and Data Governance on Kubernetes
Journey at the New York Times: Is Sidecar-Less Service Mesh Disappearing Into Infrastructure?
Lightning Talk: High Availability With ‘503: Unavailable’



Have a chat with maintainers and users at the Istio kiosk in the Project Pavilion throughout the event.


We also have the Istio Phippy book signing event organized alongside the Izzy Saves the Birthday session. Do join the talk and grab a free, signed copy of the book from the authors!


Follow us on X, LinkedIn or Bluesky to get live updates from the event. See you soon!



Istio: The Highest-Performance Solution for Network Security
Thu, 06 Mar 2025 00:00:00 +0000
Encryption in transit is a baseline requirement for almost all Kubernetes environments today, and forms the foundation of a zero-trust security posture.
However, the challenge with security is that it doesn’t come without a cost: it often involves a trade-off between complexity, user experience, and performance.
While most Cloud Native users will know of Istio as a service mesh, providing advanced HTTP functionality, it can also serve the role of providing a foundational network security layer. When we set out to build Istio’s ambient mode, these two layers were explicitly split. One of our primary objectives was to be able to offer security (and a long list of other features!) without compromise.
With ambient mode, Istio is now the highest-bandwidth way to achieve a secure zero-trust network in Kubernetes.
Lets look at some results before we dive into the how and why.
Putting it to the test
To test performance, we utilized a standard network benchmarking tool, iperf, to measure the bandwidth of TCP traffic flowing through various popular Kubernetes network security solutions.

    
        
            
        
    
    

The results speak for themselves: Istio decisively leads the pack as the highest-performing network security solution.
Even more impressive is that this gap continues to grow with each Istio release:

    
        
            
        
    
    

Istio’s performance is driven by ztunnel, a purpose built data plane that is light, fast, and secure.
Over the last 4 releases, the performance of Ztunnel has improved by 75%!

Testing Details


Istio Project Announces 2025 Steering Committee
Wed, 05 Mar 2025 00:00:00 +0000
The Istio Steering Committee oversees the administrative aspects of the project, including governance, branding, marketing, and working with the CNCF.
Every year, we estimate the proportion of the hundreds of companies that have contributed to Istio in the past year, and uses that metric to proportionally allocate the nine Contribution Seats on our Steering Committee.
After that, four Community Seats are voted for by our project members, with candidates being from companies that did not receive Contribution Seats.
In February, we announced the Contribution Seat allocation, and invited candidates to stand for the Community Seat elections.
As the election officer, I am pleased to announce the results of that election, as well as the individuals who will represent the top contributors.
Community Seats
Four excellent candidates stood for our four open seats, and thus all are elected unopposed:

Faseela K, Ericsson Software Technology
Wilson Wu, DaoCloud
Rob Cernich, Red Hat
Pratima Nambiar, Salesforce

Wilson is a top 20 contributor to Istio, and a leader in the localization of Istio’s documentation into Chinese. The three other candidates have all previously served on the Steering Committee.
Contribution Seats
Our supporting companies have made their choices for the nine Contribution Seats. They will be held by:

Craig Box (Solo.io)
Zack Butcher (Tetrate)
John Howard (Solo.io)
Idit Levine (Solo.io)
Keith Mattix (Microsoft)
Justin Pettit (Google)
Louis Ryan (Solo.io)
Lin Sun (Solo.io)
Zhonghu Xu (Huawei)

Seating the new committee
On behalf of the Steering Committee, I wish to congratulate our new and returning members. This group will serve for one year, starting this week.
We would also like to extend our heartfelt thanks to Iris Ding, Arunkumar Jayaraman, Abhi Joglekar, Kebe Liu and Jamie Longmuir, whose terms have now ended.
The new team will continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community, and help us shape the future of the world’s most popular service mesh.



Announcing Istio's 2025 Steering Committee Elections
Thu, 13 Feb 2025 00:00:00 +0000
The Istio Steering Committee oversees the administrative aspects of the project, including governance, branding, marketing, and working with the CNCF.
Every year, the leaders in the Istio project estimate the proportion of the hundreds of companies that have contributed to Istio in the past year, and uses that metric to proportionally allocate nine Contribution Seats on our Steering Committee.
Then, four Community Seats are voted for by our project members, with candidates being from companies that did not receive Contribution Seats.
We are pleased to share the result of this year’s calculation, and to kick off our Community Seat election.
Contribution seats
The calculation for the 2025-2026 term reflects the deep investment of our vendors in the Istio open source project, especially in the area of ambient mode. As was the case last year, we have five companies represented in our Contribution Seats:



    
        
            Company
            Seat allocation
        
    
    
        
            Solo.io
            5
        
        
            Microsoft
            1
        
        
            Huawei
            1
        
        
            Google
            1
        
        
            Tetrate
            1
        
    


The full allocation can be seen in our formula spreadsheet.
Community Seat election
Last year, we changed the timing of the Community Seat elections to immediately follow the allocation of the Contribution Seats. It is therefore now time to collect our nominations for candidates, and ensure our voter list is correct.
Candidates
Eligibility for candidacy is defined in the Steering Committee charter as a project member who does not work for a Company that will hold a Contribution Seat during the upcoming term.
We would now like to invite members from outside our Contribution Seat holders to stand for election. Nominations are due by February 23.
Voters
Eligibility to vote is defined in the charter as either:

a project member who has had at least one Pull Request merged in the past 12 months, or
someone who has submitted the voting exception form and has been accepted by the Steering Committee as having standing in the community through contribution of another kind.

The draft list of voters has been published. If you’re not on that list and you believe you have standing in the Istio community, please submit the exception form.
Exception requests are due by February 23. Voting will start on February 24 and last until March 9.
Announcement of the new committee
Upon the completion of the election, the entire 2025-2026 committee - election winners and company-selected Contribution Seat holders - will be announced.
The Steering Committee wishes to thank its members, old and new, and looks forward to continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community by contributing, standing for election, voting, and helping us shape the future of cloud native networking.



Policy based authorization using Kyverno
Mon, 25 Nov 2024 00:00:00 +0000
Istio supports integration with many different projects.  The Istio blog recently featured a post on L7 policy functionality with OpenPolicyAgent. Kyverno is a similar project, and today we will dive how Istio and the Kyverno Authz Server can be used together to enforce Layer 7 policies in your platform.
We will show you how to get started with a simple example.
You will come to see how this combination is a solid option to deliver policy quickly and transparently to application team everywhere in the business, while also providing the data the security teams need for audit and compliance.
Try it out
When integrated with Istio, the Kyverno Authz Server can be used to enforce fine-grained access control policies for microservices.
This guide shows how to enforce access control policies for a simple microservices application.
Prerequisites

A Kubernetes cluster with Istio installed.
The istioctl command-line tool installed.

Install Istio and configure your mesh options to enable Kyverno:
$ istioctl install -y -f - <

Notice that in the configuration, we define an extensionProviders section that points to the Kyverno Authz Server installation:
[...]
    extensionProviders:
    - name: kyverno-authz-server
      envoyExtAuthzGrpc:
        service: kyverno-authz-server.kyverno.svc.cluster.local
        port: '9081'
[...]
Deploy the Kyverno Authz Server
The Kyverno Authz Server is a GRPC server capable of processing Envoy External Authorization requests.
It is configurable using Kyverno AuthorizationPolicy resources, either stored in-cluster or provided externally.
$ kubectl create ns kyverno
$ kubectl label namespace kyverno istio-injection=enabled
$ helm install kyverno-authz-server --namespace kyverno --wait --version 0.1.0 --repo https://kyverno.github.io/kyverno-envoy-plugin kyverno-authz-server
Deploy the sample application
httpbin is a well-known application that can be used to test HTTP requests and helps to show quickly how we can play with the request and response attributes.
$ kubectl create ns my-app
$ kubectl label namespace my-app istio-injection=enabled
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/httpbin/httpbin.yaml -n my-app
Deploy an Istio AuthorizationPolicy
An AuthorizationPolicy defines the services that will be protected by the Kyverno Authz Server.
$ kubectl apply -f - <

Notice that in this resource, we define the Kyverno Authz Server extensionProvider you set in the Istio configuration:
[...]
  provider:
    name: kyverno-authz-server
[...]
Label the app to enforce the policy
Let’s label the app to enforce the policy. The label is needed for the Istio AuthorizationPolicy to apply to the sample application pods.
$ kubectl patch deploy httpbin -n my-app --type=merge -p='{
  "spec": {
    "template": {
      "metadata": {
        "labels": {
          "ext-authz": "enabled"
        }
      }
    }
  }
}'
Deploy a Kyverno AuthorizationPolicy
A Kyverno AuthorizationPolicy defines the rules used by the Kyverno Authz Server to make a decision based on a given Envoy CheckRequest.
It uses the CEL language to analyze an incoming CheckRequest and is expected to produce a CheckResponse in return.
The incoming request is available under the object field, and the policy can define variables that will be made available to all authorizations.
$ kubectl apply -f - <
      variables.allowed
        ? envoy.Allowed().Response()
        : envoy.Denied(403).Response()
EOF
Notice that you can build the CheckResponse by hand or use CEL helper functions like envoy.Allowed() and envoy.Denied(403) to simplify creating the response message:
[...]
  - expression: >
      variables.allowed
        ? envoy.Allowed().Response()
        : envoy.Denied(403).Response()
[...]
How it works
When applying the AuthorizationPolicy, the Istio control plane (istiod) sends the required configurations to the sidecar proxy (Envoy) of the selected services in the policy.
Envoy will then send the request to the Kyverno Authz Server to check if the request is allowed or not.

    
        
            
        
    
    

The Envoy proxy works by configuring filters in a chain. One of those filters is ext_authz, which implements an external authorization service with a specific message. Any server implementing the correct protobuf can connect to the Envoy proxy and provide the authorization decision; The Kyverno Authz Server is one of those servers.

    
        
            
        
    
    

Reviewing Envoy’s Authorization service documentation, you can see that the message has these attributes:


Ok response
{
  "status": {...},
  "ok_response": {
    "headers": [],
    "headers_to_remove": [],
    "response_headers_to_add": [],
    "query_parameters_to_set": [],
    "query_parameters_to_remove": []
  },
  "dynamic_metadata": {...}
}


Denied response
{
  "status": {...},
  "denied_response": {
    "status": {...},
    "headers": [],
    "body": "..."
  },
  "dynamic_metadata": {...}
}


This means that based on the response from the authz server, Envoy can add or remove headers, query parameters, and even change the response body.
We can do this as well, as documented in the Kyverno Authz Server documentation.
Testing
Let’s test the simple usage (authorization) and then let’s create a more advanced policy to show how we can use the Kyverno Authz Server to modify the request and response.
Deploy an app to run curl commands to the httpbin sample application:
$ kubectl apply -n my-app -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/curl/curl.yaml
Apply the policy:
$ kubectl apply -f - <
      variables.allowed
        ? envoy.Allowed().Response()
        : envoy.Denied(403).Response()
EOF
The simple scenario is to allow requests if they contain the header x-force-authorized with the value enabled or true.
If the header is not present or has a different value, the request will be denied.
In this case, we combined allow and denied response handling in a single expression. However it is possible to use multiple expressions, the first one returning a non null response will be used by the Kyverno Authz Server, this is useful when a rule doesn’t want to make a decision and delegate to the next rule:
[...]
  authorizations:
  # allow the request when the header value matches
  - expression: >
      variables.allowed
        ? envoy.Allowed().Response()
        : null
  # else deny the request
  - expression: >
      envoy.Denied(403).Response()
[...]
Simple rule
The following request will return 403:
$ kubectl exec -n my-app deploy/curl -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get
The following request will return 200:
$ kubectl exec -n my-app deploy/curl -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-authorized: true"
Advanced manipulations
Now the more advanced use case, apply the second policy:
$ kubectl apply -f - < 401
  - expression: >
      variables.force_unauthenticated
        ? envoy
            .Denied(401)
            .WithBody("Authentication Failed")
            .Response()
        : null
    # if force_authorized -> 200
  - expression: >
      variables.force_authorized
        ? envoy
            .Allowed()
            .WithHeader("x-validated-by", "my-security-checkpoint")
            .WithoutHeader("x-force-authorized")
            .WithResponseHeader("x-add-custom-response-header", "added")
            .Response()
            .WithMetadata(variables.metadata)
        : null
    # else -> 403
  - expression: >
      envoy
        .Denied(403)
        .WithBody("Unauthorized Request")
        .Response()
EOF
In that policy, you can see:

If the request has the x-force-unauthenticated: true  header  (or x-force-unauthenticated: enabled), we will return 401 with the “Authentication Failed” body
Else, if the request has the x-force-authorized: true  header  (or x-force-authorized: enabled), we will return 200 and manipulate request headers, response headers and inject dynamic metadata
In all other cases, we will return 403 with the “Unauthorized Request” body

The corresponding CheckResponse will be returned to the Envoy proxy from the Kyverno Authz Server. Envoy will use those values to modify the request and response accordingly.
Change returned body
Let’s test the new capabilities:
$ kubectl exec -n my-app deploy/curl -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get
Now we can change the response body.
With 403 the body will be changed to “Unauthorized Request”, running the previous command, you should receive:
Unauthorized Request
http_code=403
Change returned body and status code
Running the request with the header x-force-unauthenticated: true:
$ kubectl exec -n my-app deploy/curl -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-unauthenticated: true"
This time you should receive the body “Authentication Failed” and error 401:
Authentication Failed
http_code=401
Adding headers to request
Running a valid request:
$ kubectl exec -n my-app deploy/curl -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-authorized: true"
You should receive the echo body with the new header x-validated-by: my-security-checkpoint and the header x-force-authorized removed:
[...]
    "X-Validated-By": [
      "my-security-checkpoint"
    ]
[...]
http_code=200
Adding headers to response
Running the same request but showing only the header:
$ kubectl exec -n my-app deploy/curl -- curl -s -I -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-authorized: true"
You will find the response header added during the Authz check x-add-custom-response-header: added:
HTTP/1.1 200 OK
[...]
x-add-custom-response-header: added
[...]
http_code=200
Sharing data between filters
Finally, you can pass data to the following Envoy filters using dynamic_metadata.
This is useful when you want to pass data to another ext_authz filter in the chain or you want to print it in the application logs.

    
        
            
        
    
    

To do so, review the access log format you set earlier:
[...]
    accessLogFormat: |
      [KYVERNO DEMO] my-new-dynamic-metadata: "%DYNAMIC_METADATA(envoy.filters.http.ext_authz)%"
[...]
DYNAMIC_METADATA is a reserved keyword to access the metadata object. The rest is the name of the filter that you want to access.
In our case, the name envoy.filters.http.ext_authz is created automatically by Istio. You can verify this by dumping the Envoy configuration:
$ istioctl pc all deploy/httpbin -n my-app -oyaml | grep envoy.filters.http.ext_authz
You will see the configurations for the filter.
Let’s test the dynamic metadata. In the advance rule, we are creating a new metadata entry: {"my-new-metadata": "my-new-value"}.
Run the request and check the logs of the application:
$ kubectl exec -n my-app deploy/curl -- curl -s -I httpbin:8000/get -H "x-force-authorized: true"
$ kubectl logs -n my-app deploy/httpbin -c istio-proxy --tail 1
You will see in the output the new attributes configured by the Kyverno policy:
[...]
[KYVERNO DEMO] my-new-dynamic-metadata: '{"my-new-metadata":"my-new-value","ext_authz_duration":5}'
[...]
Conclusion
In this guide, we have shown how to integrate Istio and the Kyverno Authz Server to enforce policies for a simple microservices application.
We also showed how to use policies to modify the request and response attributes.
This is the foundational example for building a platform-wide policy system that can be used by all application teams.



A new Phippy and Friends story: Izzy Saves the Birthday
Tue, 12 Nov 2024 00:00:00 +0000
Earlier this year, we added Izzy Dolphin, the Indo-Pacific Bottlenose to the CNCF “Phippy and Friends” family. Ever since then, Istio lovers worldwide have been eagerly awaiting the first children’s book featuring our cute dolphin.
And here it is!

    
        
            
        
    
    

The Istio project is excited to unveil Izzy’s adventure sailing with the Phippy family at KubeCon North America 2024 this week, as together we celebrate the 10 year anniversary of Kubernetes. Copies are available at the CNCF Store, or on the online store shortly after the event.
Captain Kube hosts a grand birthday bash on a special cruise with Phippy and her friends, however the ship is in great danger! But there is nothing to worry about, when Izzy is in charge of the security! Join Izzy’s smart and adventurous chase of the pirates who want to spoil Captain Kube’s birthday bash.
Why the book?
The co-authors of the book, Faseela K. and Lin Sun, are both Istio maintainers and parents. They have often found themselves in a tough spot explaining what they do at work, particularly in a context that makes sense to younger people. Their children read and enjoyed the Illustrated Children’s Guide to Kubernetes but were curious to learn more about the other  characters and their roles and responsibilities!
This book is for every one who has encountered curious little eyes that keep asking you what “Service Mesh” is. It’s also a great gift for anyone of any age who needs to understand what Istio is, or who thinks that service mesh is too complex.
Acknowledgements
The Istio Steering Committee would like to thank Faseela and Lin for writing this amazing book. Suri Patel and Alex Davy from CNCF did a wonderful job with the design and illustrations, bringing the story to life. Last, but not least, a huge thanks to Katie Greenley for her support throughout the process to make sure the book was released on time for Captain Kube’s birthday celebrations at our community’s largest international conference.
We are planning a book signing event at next year’s KubeCon EU in London.
Happy reading!

    
        
            
        
    
    




Fast, Secure, and Simple: Istio’s Ambient Mode Reaches General Availability in v1.24
Thu, 07 Nov 2024 00:00:00 +0000
We are proud to announce that Istio’s ambient data plane mode has reached General Availability, with the ztunnel, waypoints and APIs being marked as Stable by the Istio TOC. This marks the final stage in Istio’s feature phase progression, signaling that ambient mode is fully ready for broad production usage.
Ambient mesh — and its reference implementation with Istio’s ambient mode — was announced in September 2022. Since then, our community has put in 26 months of hard work and collaboration, with contributions from Solo.io, Google, Microsoft, Intel, Aviatrix, Huawei, IBM, Red Hat, and many others. Stable status in 1.24 indicates the features of ambient mode are now fully ready for broad production workloads. This is a huge milestone for Istio, bringing Istio to production readiness without sidecars, and offering users a choice.
Why ambient mesh?
From the launch of Istio in 2017, we have observed a clear and growing demand for mesh capabilities for applications — but heard that many users found the resource overhead and operational complexity of sidecars hard to overcome. Challenges that Istio users shared with us include how sidecars can break applications after they are added, the large CPU and memory requirement for a proxy with every workload, and the inconvenience of needing to restart application pods with every new Istio release.
As a community, we designed ambient mesh from the ground up to tackle these problems, alleviating the previous barriers of complexity faced by users looking to implement service mesh. The new concept was named  ‘ambient mesh’ as it was designed to be transparent to your application, with no proxy infrastructure collocated with user workloads, no subtle changes to configuration required to onboard, and no application restarts required.
In ambient mode it is trivial to add or remove applications from the mesh. All you need to do is label a namespace, and all applications in that namespace are instantly added to the mesh. This immediately secures all traffic within that namespace with industry-standard mutual TLS encryption — no other configuration or restarts required!.
Refer to the Introducing Ambient Mesh blog for more information on why we built Istio’s ambient mode.
How does ambient mode make adoption easier?
The core innovation behind ambient mesh is that it slices Layer 4 (L4) and Layer 7 (L7) processing into two distinct layers. Istio’s ambient mode is powered by lightweight, shared L4 node proxies and optional L7 proxies, removing the need for traditional sidecar proxies from the data plane. This layered approach allows you to adopt Istio incrementally, enabling a smooth transition from no mesh, to a secure overlay (L4), to optional full L7 processing — on a per-namespace basis, as needed, across your fleet.
By utilizing ambient mesh, users bypass some of the previously restrictive elements of the sidecar model. Server-send-first protocols now work, most reserved ports are now available, and the ability for containers to bypass the sidecar — either maliciously or not — is eliminated.
The lightweight shared L4 node proxy is called the ztunnel (zero-trust tunnel). ztunnel drastically reduces the overhead of running a mesh by removing the need to potentially over-provision memory and CPU within a cluster to handle expected loads. In some use cases, the savings can exceed 90% or more, while still providing zero-trust security using mutual TLS with cryptographic identity, simple L4 authorization policies, and telemetry.
The L7 proxies are called waypoints. Waypoints process L7 functions such as traffic routing, rich authorization policy enforcement, and enterprise-grade resilience. Waypoints run outside of your application deployments and can scale independently based on your needs, which could be for the entire namespace or for multiple services within a namespace. Compared with sidecars, you don’t need one waypoint per application pod, and you can scale your waypoint effectively based on its scope, thus saving significant amounts of CPU and memory in most cases.
The separation between the L4 secure overlay layer and L7 processing layer allows incremental adoption of the ambient mode data plane, in contrast to the earlier binary “all-in” injection of sidecars. Users can start with the secure L4 overlay, which offers a majority of features that people deploy Istio for (mTLS, authorization policy, and telemetry). Complex L7 handling such as retries, traffic splitting, load balancing, and observability collection can then be enabled on a case-by-case basis.
Rapid exploration and adoption of ambient mode
The ztunnel image on Docker Hub has reached over 1 million downloads, with ~63,000 pulls in the last week alone.

    
        
            
        
    
    

We asked a few of our users for their thoughts on ambient mode’s GA:

    
        
            
        
        Istio’s implementation of a service mesh with their ambient mesh design has been a great addition to our Kubernetes clusters to simplify the team responsibilities and overall network architecture of the mesh. In conjunction with the Gateway API project it has given me a great way to enable developers to get their networking needs met at the same time as only delegating as much control as needed. While it’s a rapidly evolving project it has been solid and dependable in production and will be our default option for implementing networking controls in a Kubernetes deployment going forth.
— Daniel Loader, Lead Platform Engineer at Quotech


        
    



    
        
            
        
        It is incredibly easy to install ambient mesh with the Helm chart wrapper. Migrating is as simple as setting up a waypoint gateway, updating labels on a namespace, and restarting. I’m looking forward to ditching sidecars and recuperating resources. Moreover, easier upgrades. No more restarting deployments!
— Raymond Wong, Senior Architect at Forbes


        
    



    
        
            
        
        Istio’s ambient mode has served our production system since it became Beta. We are pleased by its stability and simplicity and are looking forward to additional benefits and features coming together with the GA status. Thanks to the Istio team for the great efforts!
— Saarko Eilers, Infrastructure Operations Manager at EISST International Ltd


        
    



    
        
            
        
        By Switching from AWS App Mesh to Istio in ambient mode, we were able to slash about 45% of the running containers just by removing sidecars and SPIRE agent DaemonSets. We gained many benefits, such as reducing compute costs or observability costs related to sidecars, eliminating many of the race conditions related to sidecars startup and shutdown, plus all the out-of-the-box benefits just by migrating, like mTLS, zonal awareness and workload load balancing.
— Ahmad Al-Masry, DevSecOps Engineering Manager at Harri


        
    



    
        
            
        
        We chose Istio because we’re excited about ambient mesh. Different from other options, with Istio, the transition from sidecar to sidecar-less is not a leap of faith. We can build up our service mesh infrastructure with Istio knowing the path to sidecar-less is a two way door.
— Troy Dai, Senior Staff Software Engineer at Coinbase


        
    



    
        
            
        
        Extremely proud to see the fast and steady growth of ambient mode to GA, and all the amazing collaboration that took place over the past months to make this happen! We are looking forward to finding out how the new architecture is going to revolutionize the telcos world.
— Faseela K, Cloud Native Developer at Ericsson


        
    



    
        
            
        
        We are excited to see the Istio dataplane evolve with the GA release of ambient mode and are actively evaluating it for our next-generation infrastructure platform. Istio’s community is dynamic and welcoming, and ambient mesh is a testament to the community embracing new ideas and pragmatically working to improve developer experience operating Istio at scale.
— Tyler Schade, Distinguished Engineer at GEICO Tech


        
    



    
        
            
        
        With Istio’s ambient mode reaching GA, we finally have a service mesh solution that isn’t tied to the pod lifecycle, addressing a major limitation of sidecar-based models. Ambient mesh provides a more lightweight, scalable architecture that simplifies operations and reduces our infrastructure costs by eliminating the resource overhead of sidecars.
— Bartosz Sobieraj, Platform Engineer at Spond


        
    



    
        
            
        
        Our team chose Istio for its service mesh features and strong alignment with the Gateway API to create a robust Kubernetes-based hosting solution. As we integrated applications into the mesh, we faced resource challenges with sidecar proxies, prompting us to transition to ambient mode in Beta for improved scalability and security. We started with L4 security and observability through ztunnel, gaining automatic encryption of in-cluster traffic and transparent traffic flow monitoring. By selectively enabling L7 features and decoupling the proxy from applications, we achieved seamless scaling and reduced resource utilization and latency. This approach allowed developers to focus on application development, resulting in a more resilient, secure, and scalable platform powered by ambient mode.
— Jose Marques, Senior DevOps at Blip.pt


        
    



    
        
            
        
        We are using Istio to ensure strict mTLS L4 traffic in our mesh and we are excited for ambient mode. Compared to sidecar mode it’s a massive save on resources and at the same time it makes configuring things even more simple and transparent.
— Andrea Dolfi, DevOps Engineer


        
    


What is in scope?
The general availability of ambient mode means the following things are now considered stable:

Installing Istio with support for ambient mode, with Helm or istioctl.
Adding your workloads to the mesh to gain mutual TLS with cryptographic identity, L4 authorization policies, and telemetry.
Configuring waypoints to use L7 functions such as traffic shifting, request routing, and rich authorization policy enforcement.
Connecting the Istio ingress gateway to workloads in ambient mode, supporting the Kubernetes Gateway APIs and all existing Istio APIs.
Using waypoints for controlled mesh egress
Using istioctl to operate waypoints, and troubleshoot ztunnel & waypoints.

Refer to the feature status page for more information.
Roadmap
We are not standing still! There are a number of features that we continue to work on for future releases, including some that are currently in Alpha/Beta.
In our upcoming releases, we expect to move quickly on the following extensions to ambient mode:

Full support for sidecar and ambient mode interoperability
Multi-cluster installations
Multi-network support
VM support

What about sidecars?
Sidecars are not going away, and remain first-class citizens in Istio. You can continue to use sidecars, and they will remain fully supported. While we believe most use cases will be best served with a mesh in ambient mode, the Istio project remains committed to ongoing sidecar mode support.
Try ambient mode today
With the 1.24 release of Istio and the GA release of ambient mode, it is now easier than ever to try out Istio on your own workloads.

Follow the getting started guide to explore ambient mode.
Read our user guides to learn how to incrementally adopt ambient for mutual TLS & L4 authorization policy, traffic management, rich L7 authorization policy, and more.
Explore the new Kiali 2.0 dashboard to visualize your mesh.

You can engage with the developers in the #ambient channel on the Istio Slack, or use the discussion forum on GitHub for any questions you may have.



Istio in Salt Lake City!
Tue, 05 Nov 2024 00:00:00 +0000
An amazing lineup of Istio activities awaits you in Salt Lake City, Utah at KubeCon + CloudNativeCon North America 2024!

    
        
            
        
    
    



Come to the Istio Day co-located event.


Attend the Istio Maintainers’ Track session: Life of a Packet: Ambient Edition


Drop by the Istio Contribfest session: Sidecarless Service Mesh: Let’s Work Together on Istio V2


Add the following KubeCon sessions to your schedule, all of which have an Istio flavor:

Why Choose Istio in 2025 | Project Lightning Talk
Lightning Talk: Effortless, Sidecar-Less Mutual TLS and Rich Authorization Policies up and Running in 5 Minutes
Poster Session : Unleashing the Power of Prediction to Proactively Scale Control Plane Components
What Istio Got Wrong: Learnings from the Last Seven Years of Service Mesh
Tutorial: Live with Gateway API V1.2
Mish-Mesh: Abusing the Service Mesh to Compromise Kubernetes Environments
Engaging the KServe Community, The Impact of Integrating Solutions with Standardized CNCF Projects
How Google Built a New Cloud on Top of Kubernetes
Securing Outgoing Traffic: Building a Powerful Internet Egress Gateway for Reliable Connectivity
Testing Kubernetes Without Kubernetes: A Networking Deep Dive
How GoTo Financial Automates Upgrading 60+ Istio Service Mesh Seamlessly!



Have a chat with maintainers and users at the Istio kiosk in the Project Pavilion throughout the event, where you can grab a cool Istio T-shirt with our brand new design.


We also have an interesting surprise for all Istio lovers, to be released at the KubeCon North America CNCF store. Stay tuned!


Follow us on X, LinkedIn or Bluesky to get live updates from the event. See you soon!



Scaling in the Clouds: Istio Ambient vs. Cilium
Mon, 21 Oct 2024 00:00:00 +0000
A common question from prospective Istio users is “how does Istio compare to Cilium?”  While Cilium originally only provided L3/L4 functionality, including network policy, recent releases have added service mesh functionality using Envoy, as well as WireGuard encryption. Like Istio, Cilium is a CNCF Graduated project, and has been around in the community for many years.
Despite offering a similar feature set on the surface, the two projects have substantially different architectures, most notably Cilium’s use of eBPF and WireGuard for processing and encrypting L4 traffic in the kernel, contrasted with Istio’s ztunnel component for L4 in user space. These differences have resulted in substantial speculation about how Istio will perform at scale compared to Cilium.
While many comparisons have been made about tenancy models, security protocols and basic performance of the two projects, there has not yet been a full evaluation published at enterprise scale. Rather than emphasizing theoretical performance, we put Istio’s ambient mode and Cilium through their paces, focusing on key metrics like latency, throughput, and resource consumption. We cranked up the pressure with realistic load scenarios, simulating a bustling Kubernetes environment. Finally, we pushed the size of our AKS cluster up to 1,000 nodes on 11,000 cores, to understand how these projects perform at scale. Our results show areas where each can improve, but also indicate that Istio is the clear winner.
Test Scenario
In order to push Istio and Cilium to their limits, we created 500 different services, each backed by 100 pods. Each service is in a separate namespace, which also contains one Fortio load generator client. We restricted the clients to a node pool of 100 32-core machines, to eliminate noise from collocated clients, and allocated the remaining 900 8-core instances to our services.

    
        
            
        
    
    

For the Istio test, we used Istio’s ambient mode, with a waypoint proxy in every service namespace, and default install parameters. In order to make our test scenarios similar, we had to turn on a few non-default features in Cilium, including WireGuard encryption, L7 Proxies, and Node Init. We also created a Cilium Network Policy in each namespace, with HTTP path-based rules. In both scenarios, we generated churn by scaling one service to between 85 and 115 instances at random every second, and relabeling one namespace every minute. To see the precise settings we used, and to reproduce our results, see my notes.
Scalability Scorecard

    
        
            
        
    
    

Istio was able to deliver 56% more queries at 20% lower tail latency.  The CPU usage was 30% less for Cilium, though our measurement does not include the cores Cilium used to process encryption, which is done in the kernel.
Taking into account the resource used, Istio processed 2178 Queries Per Core, vs Cilium’s 1815, a 20% improvement.

The Cilium Slowdown: Cilium, while boasting impressive low latency with default install parameters, slows down substantially when Istio’s baseline features such as L7 policy and encryption are turned on. Additionally, Cilium’s memory and CPU utilization remained high even when no traffic was flowing in the mesh. This can impact the overall stability and reliability of your cluster, especially as it grows.
Istio, The Steady Performer: Istio’s ambient mode, on the other hand, showed its strength in stability and maintaining decent throughput, even with the added overhead of encryption. While Istio did consume more memory and CPU than Cilium under test, its CPU utilization settled to a fraction of Cilium’s when not under load.

Behind the Scenes: Why the Difference?
The key to understanding these performance differences lies in the architecture and design of each tool.

Cilium’s Control Plane Conundrum: Cilium runs a control plane instance on each node, leading to API server strain and configuration overhead as your cluster expands. This frequently caused our API server to crash, followed by Cilium becoming unready, and the entire cluster becoming unresponsive.
Istio’s Efficiency Edge: Istio, with its centralized control plane and identity-based approach, streamlines configuration and reduces the burden on your API server and nodes, directing critical resources to processing and securing your traffic, rather than processing configuration. Istio takes further advantage of the resources not used in the control plane by running as many Envoy instances as a workload needs, while Cilium is limited to one shared Envoy instance per node.

Digging Deeper
While the objective of this project is to compare Istio and Cilium scalability, several constraints make a direct comparison difficult.
Layer 4 Isn’t always Layer 4
While Istio and Cilium both offer L4 policy enforcement, their APIs and implementation differ substantially. Cilium implements Kubernetes NetworkPolicy, which uses labels and namespaces to block or allow access to and from IP Addresses. Istio offers an AuthorizationPolicy API, and makes allow and deny decisions based on the TLS identity used to sign each request. Most defense-in-depth strategies will need to make use of both NetworkPolicy and TLS-based policy for comprehensive security.
Not all Encryption is Created Equal
While Cilium offers IPsec for FIPS-compatible encryption, most other Cilium features such as L7 policy and load balancing are incompatible with IPsec. Cilium has much better feature compatibility when using WireGuard encryption, but WireGuard cannot be used in FIPS-compliant environments. Istio, on the other-hand, because it strictly complies with TLS protocol standards, always uses FIPS-compliant mTLS by default.
Hidden Costs
While Istio operates entirely in user space, Cilium’s L4 dataplane runs in the Linux kernel using eBPF. Prometheus metrics for resource consumption only measure user space resources, meaning that all kernel resources used by Cilium are not accounted for in this test.
Recommendations: Choosing the Right Tool for the Job
So, what’s the verdict? Well, it depends on your specific needs and priorities. For small clusters with pure L3/L4 use cases and no requirement for encryption, Cilium offers a cost-effective and performant solution. However, for larger clusters and a focus on stability, scalability, and advanced features, Istio’s ambient mode, along with an alternate NetworkPolicy implementation, is the way to go. Many customers choose to combine the L3/L4 features of Cilium with the L4/L7 and encryption features of Istio for a defense-in-depth strategy.
Remember, the world of cloud-native networking is constantly evolving. Keep an eye on developments in both Istio and Cilium, as they continue to improve and address these challenges.
Let’s Keep the Conversation Going
Have you worked with Istio’s ambient mode or Cilium? What are your experiences and insights? Share your thoughts in the comments below. Let’s learn from each other and navigate the exciting world of Kubernetes together!



More community leadership: Regularly electing the Istio Technical Oversight Committee
Thu, 17 Oct 2024 00:00:00 +0000
Like many Open Source foundations and projects, the Istio project has two governance groups: a Steering Committee, that oversees the administrative and marketing aspects of the project, and a Technical Oversight Committee (TOC), responsible for cross-cutting product and design decisions.
The Steering Committee represents the companies and contributors that support the Istio project, while the TOC is the top of an individual contributor ladder made up of our members, maintainers and working group leads.
Each year, we build our Steering Committee with representatives from our top commercial contributors, and members elected by our maintainer community. This is the group with the responsibility of electing new TOC members, who have traditionally served indefinitely.
We want to ensure that all the members of our community have the opportunity to stand for, and serve in, our leadership positions. Today, we are pleased to announce our transition to a regularly-elected TOC, with members serving two-year terms, and call for candidates for our first election.
What does the Technical Oversight Committee do?
The charter for the TOC spells out the responsibilities of its members, including:

Setting the overall technical direction and roadmap of the project.
Resolving technical issues, disagreements, and escalations.
Declaring maturity levels for Istio features.
Approving the creation and dissolution of working groups and approving leadership changes of working groups.
Ensuring the team adheres to our code of conduct and respects our values.
Fostering an environment for a healthy and happy community of developers and contributors.

While the interest of our vendors is represented by our Steering Committee, TOC membership is associated with the individual, irrespective of their current employer. Members act independently, in their individual capacities, and must prioritize the best interests of the project and the community. This has always been achieved by method of consensus, and as such we seat an even number of members. The TOC has traditionally comprised 6 members, and this remains the case going forward.
What changes with the new charter?
The key changes in the new charter, recently ratified by the Steering Committee, are:

Members will now serve 2 year terms.
The Steering Committee will vote every year to (re-^†) seat 3 of the 6 members on the TOC.
The mechanics for election are clearly defined, including the expectation for candidates to qualify for the election, and how they will be evaluated.
The expectations of regular meetings between the Steering and TOC have been formalized.
There is now a formal process for removing a TOC member, should they lose the confidence of the Steering Committee.

^† There is no limit on the number of terms a member may serve for, and incumbent TOC members are welcome to run again at the end of their term.
TOC member farewells
We recently announced the retirement of long-time contributor Eric Van Norman. We also now bid farewell to Neeraj Poddar from the Istio TOC. Neeraj has been involved with the project since 2017, co-founding Aspen Mesh within F5, and later leading Gloo Mesh as VP of Engineering at Solo.io. He was first elected to the TOC in 2020. Neeraj has taken a role as VP of Engineering at NimbleEdge, and we congratulate him and wish him well for the future.
Maintainers: stand in our first election
We have set our annual TOC elections to occur after the seating of the Steering Committee each year, which will put the first instance around March 2025.
However, as we currently have two vacancies, we are announcing our first election will be a by-election to fill these two seats for the remainder of their terms.
The bar for joining the TOC is deliberately set high. Candidates must be tenured maintainers, recognized within the Istio community as collaborative technical leaders, and meet qualification criteria which demonstrate their suitability for the position.
To stand for a TOC seat, please send an e-mail to elections@istio.io, including a link to a one-page Google Doc with your self-assessment against the qualification criteria. Nominations will close in two weeks, on 31 October.
Good luck!



Can Your Platform Do Policy? Accelerate Teams With Platform L7 Policy Functionality
Mon, 14 Oct 2024 00:00:00 +0000
Shared computing platforms offer resources and shared functionality to tenant teams so that they don’t need to build everything from scratch themselves. While it can sometimes be hard to balance all the requests from tenants, it’s important that platform teams ask the question: what’s the highest value feature we can offer our tenants?
Often work is given directly to application teams to implement, but there are some features that are best implemented once, and offered as a service to all teams. One feature within the reach of most platform teams is offering a standard, responsive system for Layer 7 application authorization policy. Policy as code enables teams to lift authorization decisions out of the application layer into a lightweight and performant decoupled system. It might sound like a challenge, but it doesn’t have to be, with the right tools for the job.
We’re going to dive into how Istio and Open Policy Agent (OPA) can be used to enforce Layer 7 policies in your platform. We’ll show you how to get started with a simple example. You will come to see how this combination is a solid option to deliver policy quickly and transparently to application team everywhere in the business, while also providing the data the security teams need for audit and compliance.
Try it out
When integrated with Istio, OPA can be used to enforce fine-grained access control policies for microservices. This guide shows how to enforce access control policies for a simple microservices application.
Prerequisites

A Kubernetes cluster with Istio installed.
The istioctl command-line tool installed.

Install Istio and configure your mesh options to enable OPA:
$ istioctl install -y -f - <<'EOF'
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    accessLogFile: /dev/stdout
    accessLogFormat: |
      [OPA DEMO] my-new-dynamic-metadata: "%DYNAMIC_METADATA(envoy.filters.http.ext_authz)%"
    extensionProviders:
    - name: "opa.local"
      envoyExtAuthzGrpc:
        service: "opa.opa.svc.cluster.local"
        port: "9191"
EOF
Notice that in the configuration, we define an extensionProviders section that points to the OPA standalone installation.
Deploy the sample application. Httpbin is a well-known application that can be used to test HTTP requests and helps to show quickly how we can play with the request and response attributes.
$ kubectl create ns my-app
$ kubectl label namespace my-app istio-injection=enabled

$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/httpbin/httpbin.yaml -n my-app
Deploy OPA. It will fail because it expects a configMap containing the default Rego rule to use. This configMap will be deployed later in our example.
$ kubectl create ns opa
$ kubectl label namespace opa istio-injection=enabled

$ kubectl apply -f - <

Deploy the AuthorizationPolicy to define which services will be protected by OPA.
$ kubectl apply -f - <

Let’s label the app to enforce the policy:
$ kubectl patch deploy httpbin -n my-app --type=merge -p='{
  "spec": {
    "template": {
      "metadata": {
        "labels": {
          "ext-authz": "enabled"
        }
      }
    }
  }
}'
Notice that in this resource, we define the OPA extensionProvider you set in the Istio configuration:
[...]
  provider:
    name: "opa.local"
[...]
How it works
When applying the AuthorizationPolicy, the Istio control plane (istiod) sends the required configurations to the sidecar proxy (Envoy) of the selected services in the policy. Envoy will then send the request to the OPA server to check if the request is allowed or not.

    
        
            
        
    
    

The Envoy proxy works by configuring filters in a chain. One of those filters is ext_authz, which implements an external authorization service with a specific message. Any server implementing the correct protobuf can connect to the Envoy proxy and provide the authorization decision; OPA is one of those servers.

    
        
            
        
    
    

Before, when you installed OPA server, you used the Envoy version of the server. This image allows the configuration of the gRPC plugin which implements the ext_authz protobuf service.
[...]
      containers:
      - image: openpolicyagent/opa:0.61.0-envoy # This is the OPA image version which brings the Envoy plugin
        name: opa
[...]
In the configuration, you have enabled the Envoy plugin and the port which will listened to:
[...]
    decision_logs:
      console: true
    plugins:
      envoy_ext_authz_grpc:
        addr: ":9191" # This is the port where the envoy plugin will listen
        path: mypackage/mysubpackage/myrule # Default path for grpc plugin
    # Here you can add your own configuration with services and bundles
[...]
Reviewing Envoy’s Authorization service documentation, you can see that the message has these attributes:
OkHttpResponse
{
  "status": {...},
  "denied_response": {...},
  "ok_response": {
      "headers": [],
      "headers_to_remove": [],
      "dynamic_metadata": {...},
      "response_headers_to_add": [],
      "query_parameters_to_set": [],
      "query_parameters_to_remove": []
    },
  "dynamic_metadata": {...}
}
This means that based on the response from the authz server, Envoy can add or remove headers, query parameters, and even change the response status. OPA can do this as well, as documented in the OPA documentation.
Testing
Let’s test the simple usage (authorization) and then let’s create a more advanced rule to show how we can use OPA to modify the request and response.
Deploy an app to run curl commands to the httpbin sample application:
$ kubectl -n my-app run --image=curlimages/curl curl -- /bin/sleep 100d
Apply the first Rego rule and restart the OPA deployment:
$ kubectl apply -f - <

$ kubectl rollout restart deployment -n opa
The simple scenario is to allow requests if they contain the header x-force-authorized with the value enabled or true. If the header is not present or has a different value, the request will be denied.
There are multiple ways to create the Rego rule. In this case, we created two different rules. Executed in order, the first one which satisfies all the conditions will be the one that will be used.
Simple rule
The following request will return 403:
$ kubectl exec -n my-app curl -c curl  -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get
The following request will return 200 and the body:
$ kubectl exec -n my-app curl -c curl  -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-authorized: enabled"
Advanced manipulations
Now the more advanced rule. Apply the second Rego rule and restart the OPA deployment:
$ kubectl apply -f - <

$ kubectl rollout restart deployment -n opa
In that rule, you can see:
myrule["allowed"] := allow # Notice that `allowed` is mandatory when returning an object, like here `myrule`
myrule["headers"] := headers
myrule["response_headers_to_add"] := response_headers_to_add
myrule["request_headers_to_remove"] := request_headers_to_remove
myrule["body"] := body
myrule["http_status"] := status_code
Those are the values that will be returned to the Envoy proxy from the OPA server. Envoy will use those values to modify the request and response.
Notice that allowed is required when returning a JSON object instead of only true/false. This can be found in the OPA documentation.
Change returned body
Let’s test the new capabilities:
$ kubectl exec -n my-app curl -c curl  -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get
Now we can change the response body. With 403 the body in the Rego rule is changed to “Unauthorized Request”. With the previous command, you should receive:
Unauthorized Request
http_code=403
Change returned body and status code
Running the request with the header x-force-authorized: enabled you should receive the body “Authentication Failed” and error “401”:
$ kubectl exec -n my-app curl -c curl  -- curl -s -w "\nhttp_code=%{http_code}" httpbin:8000/get -H "x-force-unauthenticated: enabled"
Adding headers to request
Running a valid request, you should receive the echo body with the new header x-validated-by: my-security-checkpoint and the header x-force-authorized removed:
$ kubectl exec -n my-app curl -c curl  -- curl -s httpbin:8000/get -H "x-force-authorized: true"
Adding headers to response
Running the same request but showing only the header, you will find the response header added during the Authz check x-add-custom-response-header: added:
$ kubectl exec -n my-app curl -c curl  -- curl -s -I httpbin:8000/get -H "x-force-authorized: true"
Sharing data between filters
Finally, you can pass data to the following Envoy filters using dynamic_metadata. This is useful when you want to pass data to another ext_authz filter in the chain or you want to print it in the application logs.

    
        
            
        
    
    

To do so, review the access log format you set earlier:
[...]
    accessLogFormat: |
      [OPA DEMO] my-new-dynamic-metadata: "%DYNAMIC_METADATA(envoy.filters.http.ext_authz)%"
[...]
DYNAMIC_METADATA is a reserved keyword to access the metadata object. The rest is the name of the filter that you want to access. In your case, the name envoy.filters.http.ext_authz is created automatically by Istio. You can verify this by dumping the Envoy configuration:
$ istioctl pc all deploy/httpbin -n my-app -oyaml | grep envoy.filters.http.ext_authz
You will see the configurations for the filter.
Let’s test the dynamic metadata. In the advance rule, you are creating a new metadata entry: {"my-new-metadata": "my-new-value"}.
Run the request and check the logs of the application:
$ kubectl exec -n my-app curl -c curl  -- curl -s -I httpbin:8000/get -H "x-force-authorized: true"
$ kubectl logs -n my-app deploy/httpbin -c istio-proxy --tail 1
You will see in the output the new attributes configured by OPA Rego rules:
[...]
 my-new-dynamic-metadata: "{"my-new-metadata":"my-new-value","decision_id":"8a6d5359-142c-4431-96cd-d683801e889f","ext_authz_duration":7}"
[...]
Conclusion
In this guide, we have shown how to integrate Istio and OPA to enforce policies for a simple microservices application. We also showed how to use Rego to modify the request and response attributes. This is the foundational example for building a platform-wide policy system that can be used by all application teams.



External post: The Istio Service Mesh for People Who Have Stuff to Do
Thu, 10 Oct 2024 00:00:00 +0000

    
        
            
        
        I recently made a small contribution to Istio, an open-source service mesh project. My contribution involved adding a few tests for one of the Istio CLI commands. If you want to check out the details, you can find the pull request here. It wasn’t a huge change, but it was a great learning experience. Working on Istio helped me understand service meshes at a deeper level. I’m excited to contribute more. In this post, I’ll explain what Istio is, why it’s useful, and how it works.

        
    


Read the whole post at lucavall.in.



Introducing the Sail Operator: a new way to manage Istio
Mon, 19 Aug 2024 00:00:00 +0000
With the recent announcement of the In-Cluster IstioOperator deprecation in Istio 1.23 and its subsequent deletion for Istio 1.24, we want to build awareness of a
new operator that the team at Red Hat have been developing to manage Istio as part of the istio-ecosystem organization.
The Sail Operator manages the lifecycle of Istio control planes, making it easier and more efficient for cluster administrators to deploy, configure and upgrade Istio in large scale production environments. Instead of
creating a new configuration schema and reinventing the wheel, the Sail Operator APIs are built around Istio’s Helm chart APIs. All installation and configuration options that are exposed by Istio’s Helm charts are available
through the Sail Operator CRDs’ values fields. This means that you can easily manage and customize Istio using familiar configurations without adding additional items to learn.
The Sail Operator has 3 main resource concepts:

Istio: used to manage the Istio control planes.
Istio Revision: represents a revision of that control plane, which is an instance of Istio with a specific version and revision name.
Istio CNI: used to manage the resource and lifecycle of Istio’s CNI plugin. To install the Istio CNI Plugin, you create an IstioCNI resource.

Currently, the main feature of the Sail Operator is the Update Strategy. The operator provides an interface that manages the upgrade of Istio control plane(s).  It currently supports two update strategies:

In Place: with the InPlace strategy, the existing Istio control plane is replaced with a new version, and the workload sidecars
immediately connect to the new control plane. This way, workloads don’t need to be moved from one control plane instance to another.
Revision Based: with the RevisionBased strategy, a new Istio control plane instance is created for every change to the
Istio.spec.version field. The old control plane remains in place until all workloads have been moved to the new control plane instance. Optionally, the updateWorkloads flag can be set to automatically move
workloads to the new control plane when it is ready.

We know that doing upgrades of the Istio control plane carries risk and can require a substantial manual effort for large deployments and this is why it is our current focus. For the future, we are looking at how the
Sail Operator can better support use cases such as multi-tenancy and isolation, multi-cluster federation, and simplified integration with 3rd party projects.
The Sail Operator project is still alpha and under heavy development. Note that as an istio-ecosystem project, it is not supported as part of the Istio project. We are actively seeking feedback and contributions from the
community. If you want to get involved with the project please refer to the repo documentation and contributing guidelines. If you are a
user, you can also try the new operator by following the instructions in the
user documentation.
For more information, contact us:

Discussions
Issues
Slack




Istio has deprecated its In-Cluster Operator
Wed, 14 Aug 2024 00:00:00 +0000
Istio’s In-Cluster Operator has been deprecated in Istio 1.23.  Users leveraging the operator — which we estimate to be fewer than 10% of our user base — will need to migrate to other install and upgrade mechanisms in order to upgrade to Istio 1.24 or above. Read on to learn why we are making this change, and what operator users need to do.
Does this affect you?
This deprecation only affects users of the In-Cluster Operator.  Users who install Istio with the istioctl install command and an IstioOperator YAML file are not affected.
To determine if you are affected, run kubectl get deployment -n istio-system istio-operator and kubectl get IstioOperator.  If both commands return non-empty values, your cluster will be affected. Based on recent polls, we expect that this will affect fewer than 10% of Istio users.
Operator-based Installations of Istio will continue to run indefinitely, but cannot be upgraded past 1.23.x.
When do I need to migrate?
In keeping with Istio’s deprecation policy for Beta features, the Istio In-Cluster Operator will be removed with the release of Istio 1.24, roughly three months from this announcement. Istio 1.23 will be supported through March 2025, at which time operator users will need to migrate to another install mechanism to retain support.
How do I migrate?
The Istio project will continue to support installation and upgrade via the istioctl command, as well as with Helm. Because of Helm’s popularity within the platform engineering ecosystem, we recommend most users migrate to Helm. istioctl install is based on Helm templates, and future versions may integrate deeper with Helm.
Helm installs can also be managed with GitOps tools like Flux or Argo CD.
Users who prefer the operator pattern for running Istio can migrate to either of two new Istio Ecosystem projects, the Classic Operator Controller, or the Sail Operator.
Migrating to Helm
Helm migration requires translating your IstioOperator YAML into Helm values. Istio 1.24 and above includes a manifest translate command to perform this operation. The output is a values.yaml file, and a shell script to install equivalent Helm charts.
$ istioctl manifest translate -f istio.yaml
Migrating to istioctl
Identify your IstioOperator custom resource: there should be only one result.
$ kubectl get IstioOperator
Using the name of your resource, download your operator configuration in YAML format:
$ kubectl get IstioOperator  -o yaml > istio.yaml
Disable the In-Cluster Operator. This will not disable your control plane or disrupt your current mesh traffic.
$ kubectl scale deployment -n istio-system istio-operator –replicas 0
When you are ready to upgrade Istio to version 1.24 or later, follow the upgrade instructions, using the istio.yaml file you downloaded above.
Once you have completed and verified your migration, run the following commands to clean up your operator resources:
$ kubectl delete deployment -n istio-system istio-operator
$ kubectl delete customresourcedefinition istiooperator
Migrating to the Classic Operator Controller
A new ecosystem project, the Classic Operator Controller, is a fork of the original controller built into Istio. This project maintains the same API and code base as the original operator, but is maintained outside of Istio core.
Because the API is the same, migration is straightforward: only the installation of the new operator will be required.
Classic Operator Controller is not supported by the Istio project.
Migrating to Sail Operator
A new ecosystem project, the Sail Operator, is able to install and manage the lifecycle of the Istio control plane in a Kubernetes or OpenShift cluster.
Sail Operator APIs are built around Istio’s Helm chart APIs. All installation and configuration options that are exposed by Istio’s Helm charts are available through the Sail Operator CRD’s values: fields.
Sail Operator is not supported by the Istio project.
What is an operator, and why did Istio have one?
The operator pattern was popularized by CoreOS in 2016 as a method for codifying human intelligence into code. The most common use case is a database operator, where a user might have multiple database instances in one cluster, with multiple ongoing operational tasks (backups, vacuums, sharding).
Istio introduced istioctl and the in-cluster operator in version 1.4, in response to problems with Helm v2. Around the same time, Helm v3 was introduced, which addressed the community’s concerns, and is a preferred method for installing software on Kubernetes today. Support for Helm v3 was added in Istio 1.8.
Istio’s in-cluster operator handled installation of the service mesh components - an operation you generally do one time, and for one instance, per cluster. You can think of it as a way to run istioctl inside your cluster. However, this meant you had a high-privilege controller running inside your cluster, which weakens your security posture. It doesn’t handle any ongoing administration tasks (backing up, taking snapshots etc, are not requirements for running Istio).
The Istio operator is something you have to install into the cluster, which means you already have to manage the installation of something. Using it to upgrade the cluster likewise first required you to download and run a new version of istioctl.
Using an operator means you have created a level of indirection, where you have to have options in your custom resource to configure everything you may wish to change about an installation. Istio worked around this by offering the IstioOperator API, which allows configuration of installation options. This resource is used by both the in-cluster operator and istioctl install, so there is a trivial migration path for operator users.
Three years ago — around the time of Istio 1.12 — we updated our documentation to say that use of the operator for new Istio installations is discouraged, and that users should use istioctl or Helm to install Istio.
Having three different installation methods has caused confusion, and in order to provide the best experience for people using Helm or istioctl - over 90% of our install base - we have decided to formally deprecate the in-cluster operator in Istio 1.23.



Happy 7th Birthday, Istio!
Fri, 24 May 2024 00:00:00 +0000

    
        
            
        
    
    

On this day in 2017, Google and IBM announced the launch of the Istio service mesh. Istio
is an open technology that enables developers to seamlessly connect, manage, and secure networks of different
services — regardless of platform, source, or vendor. We can hardly believe that Istio turns seven today! To
celebrate the project’s 7th birthday, we wanted to highlight Istio’s momentum and its exciting future.
Rapid adoption among users
Istio, the most widely adopted service mesh project in the world, has been gathering significant momentum since
its inception in 2017. Last year Istio joined Kubernetes, Prometheus, and other stalwarts of the cloud native
ecosystem with its CNCF graduation.
End users range from digital native startups to the world’s largest financial institutions and telcos, with case studies
from companies including eBay, T-Mobile, Airbnb, Splunk, FICO, T-Mobile, Salesforce, and many others.
Istio’s control plane and sidecar are the #3 and #4 most downloaded images on Docker Hub, each with over 10 billion downloads.

    
        
            
        
    
    

We have over 35,000 GitHub stars on Istio’s main repository, with continuing growth. Thank you everyone who starred the istio/istio repo.

    
        
            
        
    
    

We asked a few of our users for their thoughts on the occasion of Istio’s 7th birthday:

    
        
            
        
        Today, Istio serves as the backbone of Airbnb’s service mesh, managing all our traffic between hundreds of thousands of workloads. Five years since adopting Istio, we’ve always been happy
with that decision. It’s truly amazing to be part of this vibrant and supportive community. Happy Birthday, Istio!
— Weibo He, Senior Staff Software Engineer at Airbnb


        
    



    
        
            
        
        Istio has powered our ability to rapidly deploy and test microservices in a production-like, isolated environment
along with the dependent services. This approach, known as Isolates, enables eBay’s developers to identify defects earlier in the development
lifecycle, increase the stability of live environments by reducing flakiness, and build confidence in automated
production deployments. Ultimately, this has accelerated the development process and improved the success rate of production deployments.
— Sudheendra Murthy, Principal Engineer & Service Mesh Architect at eBay


        
    



    
        
            
        
        Istio enhances the security of our cloud platform while simplifying observability by integrating distributed
tracing and OpenTelemetry. This combination provides
robust security features and deep insights into system performance, enabling more effective monitoring and
troubleshooting of our distributed services.
— Sathish Krishnan, Distinguished Engineer at UBS


        
    



    
        
            
        
        Adopting Istio has been a game changer for our engineering organization in our journey of adopting a
microservices based architecture. Its batteries-included approach has allowed us to easily manage traffic routing, gain deep visibility into our service to
service interactions with distributed tracing, and extensibility via WASM plugins. Its comprehensive feature set
has made it an essential part of our infrastructure, and has allowed our engineers to decouple application code
from infrastructure plumbing.
— Shray Kumar, Principal Software Engineer at Bluecore


        
    



    
        
            
        
        Istio is amazing, I’ve been using it for 4 to 5 years and found it very comfortable to manage thousands of
gateways for tens of thousands of pods with very low latency. If you need to set up a very secure infrastructure, Istio is a great friend. Also, it’s
excellent for infrastructures that demand a lot of security and need to be aligned with PCI/HIPAA/SoC2 standards.
— Ezequiel Arielli, Head of Cloud Platform at SIGMA Financial AI


        
    



    
        
            
        
        Istio helps us secure our environments in a standardized way across all our deployments for our various
customers. The flexibility and customization of Istio really
helps us build better applications by delegating encryption, authorization, and authentication to the service mesh
and not having to implement that across our application code base.
— Joel Millage, Software Engineer at BCubed


        
    



    
        
            
        
        We use Istio at Predibase extensively to simplify communication between our multi-cluster mesh that helps deploy
and train open source fine-tuned LLM models with low latency and failover. With Istio, we get a lot of out of the box functionality that would
otherwise take us weeks to implement.
— Gyanesh Mishra, Cloud Infrastructure Engineer at Predibase


        
    



    
        
            
        
        Istio is without a doubt the most complete and feature full Service Mesh platform on the market. This success is the direct result of an engaged community that helps itself and is always
included in the project directions. Congratulations on the anniversary, Istio!
— Daniel Requena, SRE at iFood


        
    



    
        
            
        
        We’ve been using Istio in production for years now, it’s a key component of our infrastructure allowing us to
securely connect micro-services, and provide ingress/egress traffic management and first-class observability.
The community is great and each release brings a lot of exciting features.
— Frédéric Gaudet, Senior SRE at BlablaCar


        
    


Amazing diversity of contributors and vendors
Over the past year, our community has observed tremendous growth in terms of both the number of contributing
companies and the number of contributors. Recall that Istio had 500 contributors when it turned three years
old? We have had over 1,700 contributors in the past year!
With Microsoft’s Open Service Mesh team joining
the Istio community, we added Azure to the list of clouds and enterprise Kubernetes vendors providing Istio-compatible solutions, including Google Cloud, Red Hat OpenShift, VMware Tanzu, Huawei Cloud, DaoCloud, Oracle Cloud, Tencent Cloud, Akamai Cloud and Alibaba Cloud. We are also delighted to see the Amazon Web Services team publish the EKS Blueprint for Istio
due to high demand from users wanting to run Istio on AWS.
Specialist network software providers are also driving Istio forward, with Solo.io, Tetrate and F5 Networks all offering enterprise Istio solutions that will run in any environment.
Below are the top contributing companies for the past year, with Solo.io, Google, and DaoCloud taking the top
three places. While most of these companies are Istio vendors, Salesforce and Ericsson are end users, running Istio in production!

    
        
            
        
    
    

Here are some thoughts from our community leaders:

    
        
            
        
        Service mesh adoption has been steadily rising over the past few years as cloud native adoption has matured
across industries. Istio has helped drive part of this maturation since they
graduated last year in CNCF and we wish them a fantastic birthday. We look forward to watching and supporting this
continued growth as the Istio team adds new features like ambient mode and simplifies the service mesh experience.
— Chris Aniszczyk, CTO of CNCF


        
    



    
        
            
        
        Service Meshes are core to microservice architectures, a hallmark of cloud native. Istio’s birthday celebrates the proliferation and
importance not only of observability and traffic management, but the increasing demand for secure-by-default
communications through encryption, mutual authentication, and many other core security tenets that simplify the
adoption, integration, and deployment experience.
— Emily Fox, CNCF TOC chair and Senior Principal Software Engineer at Red Hat


        
    



    
        
            
        
        In my opinion Istio isn’t a service mesh. It’s a collaborative community of users and contributors who happen to
deliver the world’s most popular service mesh. Happy birthday to this amazing community! It’s been a fantastic seven years, and
I’m looking forward to celebrating many more with my friends and colleagues from around the world in the Istio community!
— Mitch Connors, Istio Technical Oversight Committee member and Principal Engineer at Microsoft


        
    



    
        
            
        
        It has been a privilege and a fulfilling experience to be part of the world’s most popular service mesh team for
the past two years. Happy to
see Istio grow from a CNCF incubating to graduated project, and even happier to see the momentum and passion with
which the latest and greatest 1.22 release was done. Wishing many more successful releases in the coming years.
— Faseela K, Istio Steering Committee member and Cloud Native Developer at Ericsson


        
    



    
        
            
        
        What makes Istio unique is the community full of developers, users, and vendors from all across the globe working
together to make Istio the best and most powerful open service mesh in the industry. It’s the strength of the community that
has made Istio so successful and now under CNCF I look forward to seeing Istio as the de facto service mesh
standard for all cloud native applications.
— Neeraj Poddar, Istio Technical Oversight Committee member and VP of Engineering at Solo.io


        
    



    
        
            
        
        It has been a privilege to have worked with the Istio community over the last 5 years. There has been an
abundance of contributors whose dedication, passion, and hard work have made my time on the project truly
enjoyable. The community has many users who provide feedback to help make Istio the best service mesh. I continue to be
amazed by what the community does, and look forward to seeing what successes we will have in the future.
— Eric Van Norman, Istio Technical Oversight Committee member and Advisory Software Engineer at IBM


        
    



    
        
            
        
        Istio is the backbone of the Salesforce service mesh infrastructure which today powers a few trillion requests per day across all our services. We solve a lot of complicated problems with mesh. It’s great to be part of this journey and contribute to the community. Istio has matured into a reliable service mesh over the years and at the same time continues to innovate. We are excited about what’s to come in future!
— Rama Chavali, Istio Networking Working Group lead and Software Engineering Architect at Salesforce


        
    


Continuous technical innovation
We are firm believers that diversity drives innovation. What amazes us most is the continuous innovation from the
Istio community, from making upgrades easier, to adopting Kubernetes Gateway API, to adding the new sidecar-less
ambient data plane mode, to making Istio easy to use and as transparent as possible.
Istio’s ambient mode was introduced in September 2022, designed for simplified
operations, broader application compatibility, and reduced infrastructure cost. Ambient mode introduces
lightweight, shared Layer 4 (L4) node proxies and optional Layer 7 (L7) proxies, removing the need for traditional
sidecar proxies from the data plane. The core innovation behind ambient mode is that it slices the L4 and L7
processing into two distinct layers. This layered approach allows you to adopt Istio incrementally, enabling a
smooth transition from no mesh, to a secure overlay (L4), to optional full L7 processing — on a per-namespace
basis, as needed, across your fleet.
As part of the Istio 1.22 release, ambient mode has reached beta
and you can run Istio without sidecars in production with precautions.
Here are some thoughts and well-wishes from our contributors and users:

    
        
            
        
        Auto Trader has been using Istio in production, since before it was ready for production! It’s significantly
improved our operational capabilities, standardizing the way we secure, configure, and monitor our services. Upgrades have evolved from daunting tasks to almost
non-events, and the introduction of Ambient is evidence of the continued commitment to simplification – making it
easier than ever for new users to get real value with minimal effort.
— Karl Stoney, Technical Architect at AutoTrader UK


        
    



    
        
            
        
        Istio is a core component of the cloud native stack for Akamai’s Cloud, providing a secure service mesh for
products and services delivering millions of RPS and hundreds of Gigabytes of throughput per cluster. We look forward to the future roadmap for the project and are excited
to evaluate new features such as the Ambient Mesh later this year.
— Alex Chircop, Chief Product Architect at Akamai


        
    



    
        
            
        
        Istio’s networking and security capabilities have become a fundamental component of our infrastructure operations. The introduction of Istio’s ambient mode has significantly simplified management and
reduced the size of our Kubernetes cluster nodes by approximately 20%. We successfully migrated our production
system to use the ambient data plane.
— Saarko Eilers, Infrastructure Operations Manager at EISST International Ltd


        
    



    
        
            
        
        Happy birthday to Istio! It has been an honor to be a part of the great community over
the years, especially as we continue to build the world’s best service mesh with ambient mode.
— John Howard, the most prolific Istio contributor, Istio Technical Oversight Committee member, and Senior Architect at Solo.io


        
    



    
        
            
        
        It’s great to see a mature project like Istio continue to evolve and flourish. Becoming a graduated CNCF project has attracted a
wave of new developers contributing to its continued success.  Meanwhile ambient mesh and Gateway API support
promises to usher in a new era of service mesh adoption.  I’m excited to see what’s to come!
— Justin Pettit, Istio Steering Committee member and Senior Staff Engineer at Google


        
    



    
        
            
        
        Happy birthday to the incredible Istio project that has not only revolutionized the way we approach service mesh
technology but has also cultivated a vibrant and inclusive community! Witnessing Istio’s evolution from a CNCF incubating project to a graduated
project has been remarkable. The recent release of Istio 1.22 underscores its continuous growth and commitment to
excellence, offering enhanced features and improved performance. Looking forward to the next big step for the project.
— Iris Ding, Istio Steering Committee member and Software Engineer at Intel


        
    



    
        
            
        
        It’s been a privilege to be part of the Istio project from the start, seeing it and the community mature and grow over the years. On a personal note, Istio has been central to my own career for the past eight years! I firmly believe that the best of Istio is yet to come, and in the coming years we’ll see continued growth, maturity, and adoption. Cheers to the wonderful community for reaching this milestone together.
— Zack Butcher, Istio Steering Committee member and Founding & Principal Engineer at Tetrate


        
    


Learn more about Istio
If you are new to Istio, here are a few resources to help you learn more:

Check out the project website and GitHub repository.
Read the documentation.
Join the community Slack.
Follow the project on Twitter and LinkedIn.
Attend the user community meetings.
Join the working group meeting.
Become an Istio contributor and developer by submitting a membership request, after you have a pull request merged.

If you are already part of the Istio community, please wish the Istio project a happy 7th birthday, and share your
thoughts about the project on social media. Thank you for your help and support!



Say goodbye to your sidecars: Istio's ambient mode reaches Beta in v1.22
Mon, 13 May 2024 00:00:00 +0000
Today, Istio’s revolutionary new ambient data plane mode has reached Beta.
Ambient mode is designed for simplified operations, broader application compatibility, and reduced infrastructure cost.
It gives you a sidecar-less data plane that’s integrated into your infrastructure,
all while maintaining Istio’s core features of zero-trust security, telemetry, and traffic management.
Ambient mode was announced in September 2022.
Since then, our community has put in 20 months of hard work and collaboration, with
contributions from Solo.io, Google, Microsoft, Intel, Aviatrix, Huawei, IBM, Red Hat, and many others.
Beta status in 1.22 indicates the features of ambient mode are now ready for production workloads, with appropriate precautions.
This is a huge milestone for Istio, bringing both Layer 4 and Layer 7 mesh features to production
readiness without sidecars.
Why ambient mode?
In listening to feedback from Istio users, we observed a growing demand for mesh capabilities for applications — but
heard that many of you found the resource overhead and operational complexity of sidecars hard to overcome. Challenges that sidecar users
shared with us include how Istio can break applications after sidecars are added, the large consumption of CPU and memory by
sidecars, and the inconvenience of the requirement to restart application pods with every new proxy release.
As a community, we designed ambient mode to tackle these problems, alleviating the previous barriers
of complexity faced by users looking to implement service mesh. The new feature set
was named ‘ambient mode’ as it was designed to be transparent to your application, ensuring no additional configuration was
required to adopt it, and required no restarting of applications by users.
In ambient mode it is trivial to add or remove applications from the mesh. You can now simply label a namespace, and all applications
in that namespace are added to the mesh. This immediately secures all traffic with mTLS, all without sidecars or the need to
restart applications.
Refer to the Introducing Ambient Mesh blog
for more information on why we built ambient mode.
How does ambient mode make adoption easier?
Istio’s ambient mode introduces lightweight, shared Layer 4 (L4) node proxies and optional Layer 7 (L7) proxies, removing the need for
traditional sidecar proxies from the data plane. The core innovation behind ambient mode is that it slices the L4 and L7
processing into two distinct layers. This layered approach allows you to adopt Istio incrementally, enabling a smooth
transition from no mesh, to a secure overlay (L4), to optional full L7 processing — on a per-namespace basis, as needed, across
your fleet.
Ambient mode works without any modification required to your existing Kubernetes deployments. You can label a namespace to
add all of its workloads to the mesh, or opt-in certain deployments as needed. By utilizing ambient mode, users
bypass some of the previously restrictive elements of the sidecar model. Server-send-first protocols now
work, most reserved ports are now available, and the ability for containers to bypass the sidecar — either
maliciously or not — is eliminated.
The lightweight shared L4 node proxy is called the ztunnel (zero-trust tunnel). Ztunnel drastically reduces the overhead of
running a mesh by removing the need to potentially over-provision memory and CPU within a cluster to handle expected loads. In
some use cases, the savings can exceed 90% or more, while still providing zero-trust security using mutual TLS with
cryptographic identity, simple L4 authorization policies, and telemetry.
The L7 proxies are called waypoints. Waypoints process L7 functions such as traffic routing, rich authorization policy
enforcement, and enterprise-grade resilience. Waypoints run outside of your application deployments and can scale independently
based on your needs, which could be for the entire namespace or for multiple services within a namespace. Compared with
sidecars, you don’t need one waypoint per application pod, and you can scale your waypoint effectively based on its scope,
thus saving significant amounts of CPU and memory in most cases.
The separation between the L4 secure overlay layer and L7 processing layer allows incremental adoption of the ambient mode data
plane, in contrast to the earlier binary “all-in” injection of sidecars. Users can start with the secure L4 overlay, which
offers a majority of features that people deploy Istio for (mTLS, authorization policy, and telemetry).
Complex L7 handling such as retries, traffic splitting, load balancing, and observability collection can then be enabled on a case-by-case basis.
What is in the scope of the Beta?
We recommend you explore the following Beta functions of ambient mode in production with appropriate precautions, after validating
them in test environments:

Installing Istio with support for ambient mode.
Adding your workloads to the mesh to gain mutual TLS with cryptographic identity, L4 authorization policies, and telemetry.
Configuring waypoints to use L7 functions such as traffic shifting, request routing, and rich authorization policy enforcement.
Connecting the Istio ingress gateway to workloads in ambient mode, supporting all existing Istio APIs.
Using istioctl to operate waypoints, and troubleshoot ztunnel & waypoints.

Alpha features
Many other features we want to include in ambient mode have been implemented, but remain in Alpha status in this release. Please help
test them, so they can be promoted to Beta in 1.23 or later:

Multi-cluster installations
DNS proxying
Interoperability with sidecars
IPv6/Dual stack
SOCKS5 support (for outbound)
Istio’s classic APIs (VirtualService and DestinationRule)

Roadmap
We have a number of features which are not yet implemented in ambient mode, but are planned for upcoming releases:

Controlled egress traffic
Multi-network support
Improve status messages on resources to help troubleshoot and understand the mesh
VM support

What about sidecars?
Sidecars are not going away, and remain first-class citizens in Istio. You can continue to use sidecars, and they will remain
fully supported.  For any feature outside of the Alpha or Beta scope for ambient mode, you should consider using the sidecar
mode until the feature is added to ambient mode. Some use cases, such as traffic shifting based on source labels, will
continue to be best implemented using the sidecar mode. While we believe most use cases will be best served with a mesh in
ambient mode, the Istio project remains committed to ongoing sidecar mode support.
Try ambient mode today
With the 1.22 release of Istio and the Beta release of ambient mode, it is now easier than ever to try out Istio on your own
workloads. Follow the getting started guide to explore ambient mode, or read our new user guides
to learn how to incrementally adopt ambient for mutual TLS & L4 authorization policy, traffic management, rich L7
authorization policy, and more. You can engage with the developers in the #ambient channel on the Istio Slack,
or use the discussion forum on GitHub for any questions you may have.



Introducing Istio v1 APIs
Mon, 13 May 2024 00:00:00 +0000
Istio provides networking, security and telemetry APIs that are crucial for ensuring the robust security, seamless connectivity, and effective observability of services within the service mesh. These APIs are used on thousands of clusters across the world, securing and enhancing critical infrastructure.
Most of the features powered by these APIs have been considered stable for some time, but the API version has remained at v1beta1. As a reflection of the stability, adoption, and value of these resources, the Istio community has decided to promote these APIs to v1 in Istio 1.22.
In Istio 1.22 we are happy to announce that a concerted effort has been made to graduate the below APIs to v1:

Destination Rule
Gateway
Service Entry
Sidecar
Virtual Service
Workload Entry
Workload Group
Telemetry API*
Peer Authentication

Feature stability and API versions
Declarative APIs, such as those used by Kubernetes and Istio, decouple the description of a resource from the implementation that acts on it.
Istio’s feature phase definitions describe how a stable feature — one that is deemed ready for production use at any scale, and comes with a formal deprecation policy — should be matched with a v1 API. We are now making good on that promise, with our API versions matching our feature stability for both features that have been stable for some time, and those which are being newly designated as stable in this release.
Although there are currently no plans to discontinue support for the previous v1beta1 and v1alpha1 API versions, users are encouraged to manually transition to utilizing the v1 APIs by updating their existing YAML files.
Telemetry API
The v1 Telemetry API is the only API that was promoted that had changes from its previous API version. The following v1alpha1 features weren’t promoted to v1:

metrics.reportingInterval


Reporting interval allows configuration of the time between calls out to for metrics reporting. This currently only supports TCP metrics but we may use this for long duration HTTP streams in the future.
At this time, Istio lacks usage data to support the need for this feature.



accessLogging.filter


If specified, this filter will be used to select specific requests/connections for logging.
This feature is based on a relatively new feature in Envoy, and Istio needs to further develop the use case and implementation before graduating it to v1.



tracing.useRequestIdForTraceSampling


This value is true by default. The format of this Request ID is specific to Envoy, and if the Request ID generated by the proxy that receives user traffic first is not specific to Envoy, Envoy will break the trace because it cannot interpret the Request ID. By setting this value to false, we can prevent Envoy from sampling based on the Request ID.
There is not a strong use case for making this configurable through the Telemetry API.




Please share any feedback on these fields by creating issues on GitHub.
Overview of Istio CRDs
This is the full list of supported API versions:

  
      
          Category
          API
          Versions
      
  
  
      
          Networking
          Destination Rule
          v1, v1beta1, v1alpha3
      
      
          
          Istio Gateway
          v1, v1beta1, v1alpha3
      
      
          
          Service Entry
          v1, v1beta1, v1alpha3
      
      
          
          Sidecar scope
          v1, v1beta1, v1alpha3
      
      
          
          Virtual Service
          v1, v1beta1, v1alpha3
      
      
          
          Workload Entry
          v1, v1beta1, v1alpha3
      
      
          
          Workload Group
          v1, v1beta1, v1alpha3
      
      
          
          Proxy Config
          v1beta1
      
      
          
          Envoy Filter
          v1alpha3
      
      
          Security
          Authorization Policy
          v1, v1beta1
      
      
          
          Peer Authentication
          v1, v1beta1
      
      
          
          Request Authentication
          v1, v1beta1
      
      
          Telemetry
          Telemetry
          v1, v1alpha1
      
      
          Extension
          Wasm Plugin
          v1alpha1
      
  

Istio can also be configured using the Kubernetes Gateway API.
Using the v1 Istio APIs
There are some APIs in Istio that are still under active development and are subject to potential changes between releases. For instance, the Envoy Filter, Proxy Config and Wasm Plugin APIs.
Furthermore, Istio maintains a strictly identical schema across all versions of an API due to limitations in CRD versioning. Therefore, even though there is a v1 Telemetry API, the three v1alpha1 fields mentioned above can still be utilized when declaring a v1 Telemetry API resource.
For risk-averse environments, we have added a stable validation policy, a validating admission policy which can ensure that only v1 APIs and fields are used with Istio APIs.
In new environments, selecting the stable validation policy upon installing Istio will guarantee that all future Custom Resources created or updated are v1 and contain only v1 features.
If the policy is deployed into an existing Istio installation that has Custom Resources that do not comply with it, the only allowed action is to delete the resource or remove the usage of the offending fields.
To install Istio with the stable validation policy:
$ helm install istio-base -n istio-system --set experimental.stableValidationPolicy=true
To set a specific revision when installing Istio with the policy:
$ helm install istio-base -n istio-system --set experimental.stableValidationPolicy=true -set revision=x
This feature is compatible with Kubernetes 1.30 and higher. The validations are created using CEL expressions, and users can modify the validations for their specific needs.
Summary
The Istio project is committed to delivering stable APIs and features essential for the successful operation of your service mesh. We would love to receive your feedback to help guide us in making the right decisions as we continue to refine relevant use cases and stability blockers for our features. Please share your feedback by creating issues, posting in the relevant Istio Slack channel, or by joining us in our weekly working group meeting.



Gateway API Mesh Support Promoted To Stable
Mon, 13 May 2024 00:00:00 +0000
We are thrilled to announce that Service Mesh support in the Gateway API is now officially “Stable”!
With this release (part of Gateway API v1.1 and Istio v1.22), users can make use of the next-generation traffic management APIs for both ingress (“north-south”) and service mesh use cases (“east-west”).
What is the Gateway API?
The Gateway API is a collection of APIs that are part of Kubernetes, focusing on traffic routing and management.
The APIs are inspired by, and serve many of the same roles as, Kubernetes’ Ingress and Istio’s VirtualService and Gateway APIs.
These APIs have been under development both in Istio, as well as with broad collaboration, since 2020, and have come a long way since then.
While the API initially targeted only serving ingress use cases (which went GA last year), we had always envisioned allowing the same APIs to be used for traffic within a cluster as well.
With this release, that vision is made a reality: Istio users can use the same routing API for all of their traffic!
Getting started
Throughout the Istio documentation, all of our examples have been updated to show how to use the Gateway API, so explore some of the tasks to gain a deeper understanding.
Using Gateway API for service mesh should feel familiar both to users already using Gateway API for ingress, and users using VirtualService for service mesh today.

Compared to Gateway API for ingress, routes target a Service instead of a Gateway.
Compared to VirtualService, where routes associate with a set of hosts, routes target a Service.

Here is a simple example, which demonstrates routing requests to two different versions of a Service based on the request header:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: reviews
spec:
  parentRefs:
  - group: ""
    kind: Service
    name: reviews
    port: 9080
  rules:
  - matches:
    - headers:
      - name: my-favorite-service-mesh
        value: istio
    filters:
    - type: RequestHeaderModifier
      requestHeaderModifier:
      add:
        - name: hello
          value: world
    backendRefs:
    - name: reviews-v2
      port: 9080
  - backendRefs:
    - name: reviews-v1
      port: 9080
Breaking this down, we have a few parts:

First, we identify what routes we should match.
By attaching our route to the reviews Service, we will apply this routing configuration to all requests that were originally targeting reviews.
Next, matches configures criteria for selecting which traffic this route should handle.
Optionally, we can modify the request. Here, we add a header.
Finally, we select a destination for the request. In this example, we are picking between two versions of our application.

For more details, see Istio’s traffic routing internals and Gateway API’s Service documentation.
Which API should I use?
With overlapping responsibilities (and names!), picking which APIs to use can be a bit confusing.
Here is the breakdown:

  
      
          API Name
          Object Types
          Status
          Recommendation
      
  
  
      
          Gateway APIs
          HTTPRoute, Gateway, …
          Stable in Gateway API v1.0 (2023)
          Use for new deployments, in particular with ambient mode
      
      
          Istio APIs
          Virtual Service, Gateway
          v1 in Istio 1.22 (2024)
          Use for existing deployments, or where advanced features are needed
      
      
          Ingress API
          Ingress
          Stable in Kubernetes v1.19 (2020)
          Use only for legacy deployments
      
  

You may wonder, given the above, why the Istio APIs were promoted to v1 concurrently?
This was part of an effort to accurate categorize the stability of the APIs.
While we view Gateway API as the future (and present!) of traffic routing APIs, our existing APIs are here to stay for the long run, with full compatibility.
This mirrors Kubernetes’ approach with Ingress, which was promoted to v1 while directing future work towards the Gateway API.
Community
This stability graduation represents the culmination of countless hours of work and collaboration across the project.
It is incredible to look at the list of organizations involved in the API and consider back at how far we have come.
A special thanks goes out to my co-leads on the effort: Flynn, Keith Mattix, and Mike Morris, as well as the countless others involved.
Interested in getting involved, or even just providing feedback?
Check out Istio’s community page or the Gateway API contributing guide!



Istio joins Phippy and friends — Welcome Izzy!
Fri, 08 Mar 2024 00:00:00 +0000
Having sailed into, and proudly graduated within the Cloud Native Computing Foundation in 2023, it is now time for Istio to join the CNCF Phippy family’s mission to demystify and simplify cloud native computing.
The Istio Steering Committee is excited to unveil Izzy Dolphin, the Istio Indo-Pacific Bottlenose, who today dives into the family of “Phippy and Friends”.

    
        
            
        
    
    

Istio stands on the shoulders of several other CNCF projects, including Kubernetes, Envoy, Prometheus, and Helm. Izzy is proud to join Phippy, Hazel, and Captain Kube’s gang, taking cloud native to the masses.

    
        
            
        
    
    

Izzy not only represents the hard work and imagination of Istio’s maintainers from diverse companies, but will help us illustrate the concepts of service mesh and Istio’s new ambient mode in an easy manner. Keep tuned, as we build out our illustrated guides where Izzy will demystify Istio and service mesh in terms a child could understand! Next time you’re breaking down these concepts for people who don’t share your background knowledge, how about using Izzy?
Istio was initially developed by Google and IBM and built on the Envoy project from Lyft. The project now has maintainers from more than 16 companies, including many of the largest networking vendors and cloud organizations worldwide. Istio provides zero-trust networking, policy enforcement, traffic management, load balancing, and monitoring without requiring applications to be rewritten.
Over the years Istio has made substantial strides in simplifying the complex problem of cloud native networking. We understand that these concepts remain complicated for many, and that is why we are proud to join Phippy’s mission to talk about tech in an accessible, straight-forward manner. We would like to open the doors of service mesh technology to more folks than ever before through Izzy and enable you to join #teamcloudnative!



Istio's Steering Committee for 2024
Thu, 15 Feb 2024 00:00:00 +0000
The Istio Steering Committee oversees the administrative aspects of the project, including governance, branding, marketing, and working with the CNCF.
Every year, the leaders in the Istio project estimate the proportion of the hundreds of companies that have contributed to Istio in the past year, and uses that metric to proportionally allocate nine Contribution Seats on our Steering Committee.
Then, four Community Seats are voted for by our project members, with candidates being from companies that did not receive Contribution Seats.
We are pleased to share the result of this year’s calculation, and changes to our Community Seat holders as a result.
Contribution seats
The calculation for the 2024-2025 term brings the most diverse set of company representation ever in the Contribution Seats, with five companies¹ represented:



    
        
            Company
            Seat allocation
        
    
    
        
            Google
            3
        
        
            Solo.io
            2
        
        
            IBM/Red Hat
            2
        
        
            DaoCloud
            1
        
        
            Huawei
            1
        
    


The full allocation can be seen in our formula spreadsheet.
Community seats
As a result of this year’s calculation, two of our Community Seat holders move to Contribution Seats. This creates two extra vacancies, which are allocated to the runners-up of our last election².
We are pleased to welcome our two newest Community Seat holders, Mitch Connors from Aviatrix and Keith Mattix from Microsoft. Both are highly active maintainers and leaders in the project, and we are delighted to have them join the Steering Committee.
Proposed changes to election timing
Our charter currently allocates Contribution Seats in February and holds the Community Seat election in July. We previously anticipated a situation where people would change seat types mid-term, and this has now come to pass.
We will therefore be voting on a change to our Charter which will move our Community Seat elections to February, to be held immediately after the allocation of Contribution Seats. It is our intention that the next annual election be held in February 2025.
The full group
Following these changes, we now have representation from nine companies:



  
      
          Name
          Company
          Profile
          Seat type
      
  
  
      
          Craig Box
          Solo.io
          craigbox
          Contribution seat
      
      
          Rob Cernich
          Red Hat
          rcernich
          Contribution seat
      
      
          Mitch Connors
          Aviatrix
          therealmitchconnors
          Community seat
      
      
          Iris (Shaojun) Ding
          Intel
          irisdingbj
          Community seat
      
      
          Cameron Etezadi
          Google
          cetezadi
          Contribution seat
      
      
          John Howard
          Google
          howardjohn
          Contribution seat
      
      
          Faseela K
          Ericsson Software Technology
          kfaseela
          Community seat
      
      
          Kebe Liu
          DaoCloud
          kebe7jun
          Contribution seat
      
      
          Jamie Longmuir
          Red Hat
          longmuir
          Contribution seat
      
      
          Keith Mattix
          Microsoft
          keithmattix
          Community seat
      
      
          Justin Pettit
          Google
          justinpettit
          Contribution seat
      
      
          Lin Sun
          Solo.io
          linsun
          Contribution seat
      
      
          Zhonghu Xu
          Huawei
          hzxuzhonghu
          Contribution seat
      
  


Our sincerest thanks to Ameer Abbas, April Kyle Nassi, Cale Rath and Chaomeng Zhang, whose terms have come to an end.
The Steering Committee wishes to thank its members, old and new, and looks forward to continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community by contributing, voting, and helping us shape the future of cloud native networking.




Our Steering Committee charter considers groups of companies as one, for the purposes of allocation of seats. This means we group IBM and Red Hat together as a single entity. ↩︎


The first runner-up from the election was Kebe Liu from DaoCloud, who will join with their newly allocated Contribution Seat. ↩︎






Maturing Istio Ambient: Compatibility Across Various Kubernetes Providers and CNIs
Mon, 29 Jan 2024 00:00:00 +0000
The Istio project announced ambient mesh - its new sidecar-less dataplane mode in 2022,
and released an alpha implementation in early 2023.
Our alpha was focused on proving out the value of the ambient data plane mode under limited configurations and environments.
However, the conditions were quite limited. Ambient mode relies on transparently redirecting traffic between workload pods and ztunnel, and the initial
mechanism we used to do that conflicted with several categories of 3rd-party Container Networking Interface (CNI) implementations.
Through GitHub issues and Slack discussions, we heard our users wanted to be able to use ambient mode in minikube
and Docker Desktop, with CNI implementations like Cilium and Calico,
and on services that ship in-house CNI implementations
like OpenShift and Amazon EKS.
Getting broad support for Kubernetes anywhere has become the No. 1 requirement for ambient mesh moving to beta — people have come to expect Istio to
work on any Kubernetes platform and with any CNI implementation. After all, ambient wouldn’t be ambient without being all around you!
At Solo, we’ve been integrating ambient mode into our Gloo Mesh product, and came up with an innovative solution to this problem.
We decided to upstream our changes in late 2023 to help ambient reach beta faster,
so more users can operate ambient in Istio 1.21 or newer, and enjoy the benefits of ambient sidecar-less mesh in their platforms
regardless of their existing or preferred CNI implementation.
How did we get here?
Service meshes and CNIs: it’s complicated
Istio is a service mesh, and all service meshes by strict definition are not CNI implementations - service meshes require a
spec-compliant, primary CNI implementation to be present in every Kubernetes cluster, and rest on top of that.
This primary CNI implementation may be provided by your cloud provider (AKS, GKE, and EKS all ship their own), or by third-party CNI
implementations like Calico and Cilium. Some service meshes may also ship bundled with their own primary CNI implementation, which they
explicitly require to function.
Basically, before you can do things like secure pod traffic with mTLS and apply high-level authentication and authorization policy at the
service mesh layer, you must have a functional Kubernetes cluster with a functional CNI implementation, to make sure the basic networking
pathways are set up so that packets can get from one pod to another (and from one node to another) in your cluster.
Though some service meshes may also ship and require their own in-house primary CNI implementation, and it is sometimes possible to run two
primary CNI implementations in parallel within the same cluster (for instance, one shipped by the cloud provider, and a 3rd-party
implementation), in practice this introduces a whole host of compatibility issues, strange behaviors, reduced feature sets, and some
incompatibilities due to the wildly varying mechanisms each CNI implementation might employ internally.
To avoid this, the Istio project has chosen not to ship or require our own primary CNI implementation, or even require a “preferred” CNI
implementation - instead choosing to support CNI chaining with the widest possible ecosystem of CNI implementations, and ensuring maximum
compatibility with managed offerings, cross-vendor support, and composability with the broader CNCF ecosystem.
Traffic redirection in ambient alpha
The istio-cni component is an optional component in the sidecar data plane mode,
commonly used to remove the requirement for the NET_ADMIN and NET_RAW capabilities for
users deploying pods into the mesh. istio-cni is a required component in the ambient
data plane mode.  The istio-cni component is not a primary CNI implementation, it is a node agent that extends whatever primary CNI implementation is already present in the cluster.
Whenever pods are added to an ambient mesh, the istio-cni component configures traffic redirection for all
incoming and outgoing traffic between the pods and the ztunnel running on
the pod’s node, via the node-level network namespace. The key difference between the sidecar mechanism and the ambient alpha mechanism
is that in the latter, pod traffic was redirected out of the pod network namespace, and into the co-located ztunnel pod network namespace - necessarily passing through the host network namespace on the way, which is where the bulk of the traffic redirection rules to achieve this were implemented.
As we tested more broadly in multiple real-world Kubernetes environments, which have their own default CNI, it became clear that capturing and
redirecting pod traffic in the host network namespace, as we were during alpha development, was not going to meet our requirements. Achieving our goals in a generic manner across these diverse environments was simply not feasible with this approach.
The fundamental problem with redirecting traffic in the host network namespace is that this is precisely the same spot where the cluster’s primary CNI implementation must configure traffic routing/networking rules. This created inevitable conflicts, most critically:

The primary CNI implementation’s basic host-level networking configuration could interfere with the host-level ambient networking configuration from Istio’s CNI extension, causing traffic disruption and other conflicts.
If users deployed a network policy to be enforced by the primary CNI implementation, the network policy might not be enforced when the
Istio CNI extension is deployed (depending on how the primary CNI implementation enforces NetworkPolicy)

While we could design around this on a case-by-case basis for some primary CNI implementations, we could not sustainably approach
universal CNI support. We considered eBPF, but realized any eBPF implementation would have the same basic problem, as there is no
standardized way to safely chain/extend arbitrary eBPF programs at this time, and we would still potentially have a hard time supporting
non-eBPF CNIs with this approach.
Addressing the challenges
A new solution was necessary - doing redirection of any sort in the node’s network namespace would create unavoidable conflicts,
unless we compromised our compatibility requirements.
In sidecar mode, it is trivial to configure traffic redirection between the sidecar and application pod, as both operate within
the pod’s network namespace. This led to a light-bulb moment: why not mimic sidecars, and configure the redirection in
the application pod’s network namespace?
While this sounds like a “simple” thought, how would this even be possible? A critical requirement of ambient is that ztunnel must run outside application pods, in the Istio system namespace. After some research, we discovered a Linux process running in one network namespace could create and own listening sockets within another network namespace. This is a basic capability of the Linux socket API.
However, to make this work operationally and cover all pod lifecycle scenarios, we had to make architectural changes to the ztunnel as well as to the istio-cni node agent.
After prototyping and sufficiently validating that this novel approach does work for all the Kubernetes platforms we have access to, we built confidence in the work and decided to contribute to upstream this new traffic redirection
model, an in-Pod traffic redirection mechanism between workload pods and the ztunnel node proxy component that has been built from the ground up to be highly compatible with all major cloud providers and CNIs.
The key innovation is to deliver the pod’s network namespace to the co-located ztunnel so that ztunnel can start its redirection
sockets inside the pod’s network namespace, while still running outside the pod. With this approach, the traffic redirection
between ztunnel and application pods happens in a way that’s very similar to sidecars and application pods today and is
strictly invisible to any Kubernetes primary CNI operating in the node network namespace. Network policy can continue to be enforced and managed by any Kubernetes primary CNI,
regardless of whether the CNI uses eBPF or iptables, without any conflict.
Technical deep dive of in-Pod traffic redirection
First, let’s go over the basics of how a packet travels between pods in Kubernetes.
Linux, Kubernetes, and CNI  - what’s a network namespace, and why does it matter?
In Linux, a container is one or more Linux processes running within isolated Linux namespaces. A Linux namespace
is simply a kernel flag that controls what processes running within that namespace are able to see. For instance, if you
create a new Linux network namespace via the ip netns add my-linux-netns command and run a process inside it, that process can only see the networking rules created
within that network namespace. It can not see any network rules created outside of it - even though everything running on that machine is still sharing one Linux networking stack.
Linux namespaces are conceptually a lot like Kubernetes namespaces - logical labels that organize and isolate different
active processes, and allow you to create rules about what things within a given namespace can see and what rules are
applied to them - they simply operate at a much lower level.
When a process running within a network namespace creates a TCP packet outward bound for something else, the packet must be
processed by any local rules within the local network namespace first, then leave the local network namespace, passing
into another one.
For example, in plain Kubernetes without any mesh installed, a pod might create a packet and send it to another pod, and
the packet might (depending on how networking was set up):

Be processed by any rules within the source pod’s network namespace.
Leave the source pod network namespace, and bubble up into the node’s network namespace where it is processed by any rules in that namespace.
From there, finally be redirected into the target pod’s network namespace (and processed by any rules there).

In Kubernetes, the Container Runtime Interface (CRI) is responsible for talking to the Linux kernel, creating network namespaces
for new pods, and starting processes within them. The CRI then invokes the Container Networking Interface (CNI),
which is responsible for wiring up the networking rules in the various Linux network namespaces, so that packets leaving and
entering the new pod can get where they’re supposed to go. It doesn’t matter much to Kubernetes or the container runtime what topology or mechanism the CNI uses to accomplish this - as long as packets get where they’re supposed to be, Kubernetes works and everyone is happy.
Why did we drop the previous model?
In Istio ambient mesh, every node has a minimum of two containers running as Kubernetes DaemonSets:

An efficient ztunnel which handles mesh traffic proxying duties, and L4 policy enforcement.
A istio-cni node agent that handles adding new and existing pods into the ambient mesh.

In the previous ambient mesh implementation, this is how application pod is added to the ambient mesh:

The istio-cni node agent detects an existing or newly-started Kubernetes pod with its namespace labeled with istio.io/dataplane-mode=ambient, indicating that it should be included in the ambient mesh.
The istio-cni node agent then establishes network redirection rules in the host network namespace, such that
packets entering or leaving the application pod  would be intercepted and redirected to that node’s ztunnel on the relevant
proxy ports (15008, 15006, or 15001).

This means that for a packet created by a pod in the ambient mesh, that packet would leave that source pod, enter the node’s
host network namespace, and then ideally would be intercepted and redirected to that node’s ztunnel (running in its own network
namespace) for proxying to the destination pod, with the return trip being similar.
This model worked well enough as a placeholder for the initial ambient mesh alpha implementation, but as mentioned, it has a fundamental
problem - there are many CNI implementations, and in Linux there are many fundamentally different and incompatible ways
in which you can configure how packets get from one network namespace to another. You can use tunnels, overlay networks,
go through the host network namespace, or bypass it. You can go through the Linux user space networking stack,
or you can skip it and shuttle packets back and forth in the kernel space stack, etc. For every possible approach,
there’s probably a CNI implementation out there that makes use of it.
Which meant that with the previous redirection approach, there were a lot of CNI implementations ambient simply wouldn’t
work with. Given its reliance on host network namespace packet redirection - any CNI that didn’t route packets thru the
host network namespace would need a different redirection implementation. And even for CNIs that did do this, we would
have unavoidable and potentially unresolvable problems with conflicting host-level rules. Do we intercept before the CNI,
or after? Will some CNIs break if we do one, or the other, and they aren’t expecting that? Where and when is NetworkPolicy
enforced, since NetworkPolicy must be enforced in the host network namespace? Do we need lots of code to special-case
every popular CNI?
Istio ambient traffic redirection: the new model
In the new ambient model, this is how application pod is added to the ambient mesh:

The istio-cni node agent detects a Kubernetes pod (existing or newly-started) with its namespace labeled with istio.io/dataplane-mode=ambient, indicating that it should be included in the ambient mesh.

If a new pod is started that should be added to the ambient mesh, a CNI plugin (as installed and managed by the istio-cni agent) is triggered by the CRI.
This plugin is used to push a new pod event to the node’s istio-cni agent, and block pod startup until the agent successfully configures
redirection. Since CNI plugins are invoked by the CRI as early as possible in the Kubernetes pod creation process, this ensures that we can
establish traffic redirection early enough to prevent traffic escaping during startup, without relying on things like init containers.
If an already-running pod becomes added to the ambient mesh, a new pod event is triggered. The istio-cni node agent’s Kubernetes
API watcher detects this, and redirection is configured in the same manner.


The istio-cni node agent enters the pod’s network namespace and establishes network redirection rules inside the pod network namespace, such that packets entering and leaving the pod are intercepted and transparently redirected to the node-local ztunnel proxy instance listening on well-known ports (15008, 15006, 15001).
The istio-cni node agent then informs the node ztunnel over a Unix domain socket that it should establish local proxy
listening ports inside the pod’s network namespace, (on 15008, 15006, and 15001), and provides ztunnel with a low-level
Linux file descriptor representing the pod’s network namespace.

While typically sockets are created within a Linux network namespace by the process actually running inside that
network namespace, it is perfectly possible to leverage Linux’s low-level socket API to allow a process running in one
network namespace to create listening sockets in another network namespace, assuming the target network namespace is known
at creation time.


The node-local ztunnel internally spins up a new proxy instance and listen port set, dedicated to the newly-added pod.
Once the in-Pod redirect rules are in place and the ztunnel has established the listen ports, the pod is added in the
mesh and traffic begins flowing thru the node-local ztunnel, as before.

Here’s a basic diagram showing the flow of application pod being added to the ambient mesh:

    
        
            
        
    
    

Once the pod is successfully added to the ambient mesh, traffic to and from pods in the mesh will be fully encrypted with mTLS by default, as always with Istio.
Traffic will now enter and leave the pod network namespace as encrypted traffic - it will look like every pod in the ambient mesh has the ability to enforce mesh policy and securely encrypt traffic, even though the user application running in the pod
has no awareness of either.
Here’s a diagram to illustrate how encrypted traffic flows between pods in the ambient mesh in the new model:

    
        
            
        
    
    

And, as before, unencrypted plaintext traffic from outside the mesh can still be handled and policy enforced, for use cases
where that is necessary:

    
        
            
        
    
    

The new ambient traffic redirection: what this gets us
The end result of the new ambient capture model is that all traffic capture and redirection happens inside the pod’s network namespace.
To the node, the CNI, and everything else, it looks like there is a sidecar proxy inside the pod, even though there is no sidecar proxy running in the pod
at all. Remember that the job of CNI implementations is to get packets to and from the pod. By design and by the CNI spec, they
do not care what happens to packets after that point.
This approach automatically eliminates conflicts with a wide range of CNI and NetworkPolicy implementations, and drastically
improves Istio ambient mesh compatibility with all major managed Kubernetes offerings across all major CNIs.
Wrapping up
Thanks to significant amounts of effort from our lovely community in testing the change with a large variety of Kubernetes platforms and CNIs, and many rounds of reviews from Istio maintainers, we are glad to announce that the ztunnel and istio-cni PRs implementing this feature merged to Istio 1.21 and are enabled by default for ambient, so Istio users can start running ambient mesh on any Kubernetes platforms with any CNIs in Istio 1.21 or newer. We’ve tested this with GKE,
AKS, and EKS and all the CNI implementations they offer, as well as with 3rd-party CNIs like
Calico and Cilium, as well as platforms like OpenShift, with solid results.
We are extremely excited that we are able to
move Istio ambient mesh forward to run everywhere with this innovative in-Pod traffic redirection approach between ztunnel
and users’ application pods. With this top technical hurdle to ambient beta resolved, we can’t wait to work with the
rest of the Istio community to get ambient mesh to beta soon! To learn more about ambient mesh’s beta progress, join us in
the #ambient and #ambient-dev channel in Istio’s slack, or attend the weekly ambient contributor meeting on Wednesdays,
or check out the ambient mesh beta project board and help us fix something!



Istio in Paris! See you at KubeCon Europe 2024
Fri, 19 Jan 2024 00:00:00 +0000
There will be lots of Istio-related activity at KubeCon + CloudNativeCon Europe in Paris! We’ll keep this page updated with more details as they are published.

    
        
            
        
    
    



Come to the Istio Day co-located event.


The following KubeCon sessions will be based on Istio, add them to your schedule:

Keynote: Platform Building Blocks: How to Build ML Infrastructure with CNCF Projects
What Not Do When You’re Updating Istio in a Critical Environment?
Comparing Sidecar-Less Service Mesh from Cilium and Istio
Next-Level Security: Implementing mTLS in Istio Multi-Cluster Environments Using SPIRE
Scaling Service Mesh: Self Service Beyond 300 Clusters
Lightning Talk: Help! My Envoy Sidecar Is Consuming 8GBs of Memory!
Poster Session: Kubernetes in the Confidential Computing Marvels: Unlocking SMPC Across Multi-Cloud Clusters
Poster Session: Serve CAKES for Your Developers: Introducing the Cloud Native CAKES Stack for Zero Trust!
Tutorial: Configuring Your Service Mesh with Gateway API
Product Market Misfit: Adventures in User Empathy



Attend the co-located Observability Day session related to Istio: A Practical Guide on How to Monitor and Compare Service Mesh Infrastructure Costs


Attend the co-located Multi-TenancyCon session related to Istio: Lightning Talk: Establishing Trust Between Microservices in a Multi Tenant Cloud


Attend the co-located Platform Engineering Day session related to Istio: Building an AI-Powered, Paved Road Platform with Cloud-Native OSS


Attend the co-located AppDeveloperCon session related to Istio: Navigating the Complexities of Service to Service Invocations: Deep and Brief Dive Into Causality


Attend the co-located Cloud Native AI Day session related to Istio: Platform Building Blocks: How to Build ML Infrastructure with CNCF Projects


Attend the Istio Maintainers’ Track session


Come and have a chat at the Istio kiosk in the Project Pavilion throughout the event.


See you soon in Paris!



Routing egress traffic to wildcard destinations
Fri, 01 Dec 2023 00:00:00 +0000
If you are using Istio to handle application-originated traffic to destinations outside of the mesh, you’re probably familiar with the concept of egress gateways.
Egress gateways can be used to monitor and forward traffic from mesh-internal applications to locations outside of the mesh.
This is a useful feature if your system is operating in a restricted
environment and you want to control what can be reached on the public internet from your mesh.
The use-case of configuring an egress gateway to handle arbitrary wildcard domains had been included in the official Istio docs up until version 1.13, but was subsequently removed because the documented solution was not officially supported or recommended and was subject to breakage in future versions of Istio.
Nevertheless, the old solution was still usable with Istio versions before 1.20. Istio 1.20, however, dropped some Envoy functionality that was required for the approach to work.
This post attempts to describe how we resolved the issue and filled the gap with a similar approach using Istio version-independent components and Envoy features, but without the need for a separate Nginx SNI proxy.
Our approach allows users of the old solution to seamlessly migrate configurations before their systems face the breaking changes in Istio 1.20.
Problem to solve
The currently documented egress gateway use-cases rely on the fact that the target of the traffic
(the hostname) is statically configured in a VirtualService, telling Envoy in the egress gateway pod where to TCP proxy
the matching outbound connections. You can use multiple, and even wildcard, DNS names to match the routing criteria, but you
are not able to route the traffic to the exact location specified in the application request. For example you can match traffic for targets
*.wikipedia.org, but you then need to forward the traffic to a single final target, e.g., en.wikipedia.org. If there is another
service, e.g., anyservice.wikipedia.org, that is not hosted by the same server(s) as en.wikipedia.org, the traffic to that host will fail. This is because, even though the target hostname in the
TLS handshake of the HTTP payload contains anyservice.wikipedia.org, the en.wikipedia.org servers will not be able to serve the request.
The solution to this problem at a high level is to inspect the original server name (SNI extension) in the application TLS handshake (which is sent
in plain-text, so no TLS termination or other man-in-the-middle operation is needed) in every new gateway connection and use it as
the target to dynamically TCP proxy the traffic leaving the gateway.
When restricting egress traffic via egress gateways, we need to lock down the egress gateways so that they can only be used
by clients within the mesh. This is achieved by enforcing ISTIO_MUTUAL (mTLS peer authentication) between the application
sidecar and the gateway. That means that there will be two layers of TLS on the application L7 payload. One that is the application
originated end-to-end TLS session terminated by the final remote target, and another one that is the Istio mTLS session.
Another thing to keep in mind is that in order to mitigate any potential application pod corruption, the application sidecar and the gateway should both perform hostname list checks.
This way, any compromised application pod will still only be able to access the allowed targets and nothing more.
Low-level Envoy programming to the rescue
Recent Envoy releases include a dynamic TCP forward proxy solution that uses the SNI header on a per-
connection basis to determine the target of an application request. While an Istio VirtualService cannot configure a target like this, we are able to use
EnvoyFilters to alter the Istio generated routing instructions so that the SNI header is used to determine the target.
To make it all work, we start by configuring a custom egress gateway to listen for the outbound traffic. Using
a DestinationRule and a VirtualService we instruct the application sidecars to route the traffic (for a selected
list of hostnames) to that gateway, using Istio mTLS. On the gateway pod side we build the SNI forwarder with the
EnvoyFilters, mentioned above, introducing internal Envoy listeners and clusters to make it all work. Finally, we patch the
internal destination of the gateway-implemented TCP proxy to the internal SNI forwarder.
The end-to-end request flow is shown in the following diagram:

    
        
            
        
    
    Egress SNI routing with arbitrary domain names

This diagram shows an egress HTTPS request to en.wikipedia.org using SNI as a routing key.


Application container
Application originates HTTP/TLS connection towards the final destination.
Puts destination’s hostname into the SNI header. This TLS session is not
decrypted inside the mesh. Only SNI header is inspected (as it is in cleartext).


Sidecar proxy
Sidecar intercepts traffic to matching hostnames in the SNI header from the application originated TLS sessions.
Based on the VirtualService, the traffic is routed to the egress gateway while wrapping original traffic into
Istio mTLS as well. Outer TLS session has the gateway Service address in the SNI header.


Mesh listener
A dedicated listener is created in the Gateway that mutually authenticates the Istio mTLS traffic.
After the outer Istio mTLS termination, it unconditionally sends the inner TLS traffic with a TCP proxy
to the other (internal) listener in the same Gateway.


SNI forwarder
Another listener with SNI forwarder performs a new TLS header inspection for the original TLS session.
If the inner SNI hostname matches the allowed domain names (including wildcards), it TCP proxies the
traffic to the destination, read from the header per connection. This listener is internal to Envoy
(allowing it to restart traffic processing to see the inner SNI value), so that no pods (inside or outside the mesh)
can connect to it directly. This listener is 100% manually configured through EnvoyFilter.


Deploy the sample
In order to deploy the sample configuration, start by creating the istio-egress namespace and then use the following YAML to deploy an egress gateway, along with some RBAC
and its Service. We use the gateway injection method to create the gateway in this example. Depending on your install method, you may want to
deploy it differently (for example, using an IstioOperator CR or using Helm).
# New k8s cluster service to put egressgateway into the Service Registry,
# so application sidecars can route traffic towards it within the mesh.
apiVersion: v1
kind: Service
metadata:
  name: egressgateway
  namespace: istio-egress
spec:
  type: ClusterIP
  selector:
    istio: egressgateway
  ports:
  - port: 443
    name: tls-egress
    targetPort: 8443

---
# Gateway deployment with injection method
apiVersion: apps/v1
kind: Deployment
metadata:
  name: istio-egressgateway
  namespace: istio-egress
spec:
  selector:
    matchLabels:
      istio: egressgateway
  template:
    metadata:
      annotations:
        inject.istio.io/templates: gateway
      labels:
        istio: egressgateway
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: istio-proxy
        image: auto # The image will automatically update each time the pod starts.
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsUser: 1337
          runAsGroup: 1337

---
# Set up roles to allow reading credentials for TLS
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: istio-egressgateway-sds
  namespace: istio-egress
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]
- apiGroups:
  - security.openshift.io
  resourceNames:
  - anyuid
  resources:
  - securitycontextconstraints
  verbs:
  - use

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: istio-egressgateway-sds
  namespace: istio-egress
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: istio-egressgateway-sds
subjects:
- kind: ServiceAccount
  name: default
Verify the gateway pod is up and running in the istio-egress namespace and then apply the following YAML to configure the gateway routing:
# Define a new listener that enforces Istio mTLS on inbound connections.
# This is where sidecar will route the application traffic, wrapped into
# Istio mTLS.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: egressgateway
  namespace: istio-system
spec:
  selector:
    istio: egressgateway
  servers:
  - port:
      number: 8443
      name: tls-egress
      protocol: TLS
    hosts:
      - "*"
    tls:
      mode: ISTIO_MUTUAL

---
# VirtualService that will instruct sidecars in the mesh to route the outgoing
# traffic to the egress gateway Service if the SNI target hostname matches
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: direct-wildcard-through-egress-gateway
  namespace: istio-system
spec:
  hosts:
    - "*.wikipedia.org"
  gateways:
  - mesh
  - egressgateway
  tls:
  - match:
    - gateways:
      - mesh
      port: 443
      sniHosts:
        - "*.wikipedia.org"
    route:
    - destination:
        host: egressgateway.istio-egress.svc.cluster.local
        subset: wildcard
# Dummy routing instruction. If omitted, no reference will point to the Gateway
# definition, and istiod will optimise the whole new listener out.
  tcp:
  - match:
    - gateways:
      - egressgateway
      port: 8443
    route:
    - destination:
        host: "dummy.local"
      weight: 100

---
# Instruct sidecars to use Istio mTLS when sending traffic to the egress gateway
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: egressgateway
  namespace: istio-system
spec:
  host: egressgateway.istio-egress.svc.cluster.local
  subsets:
  - name: wildcard
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

---
# Put the remote targets into the Service Registry
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: wildcard
  namespace: istio-system
spec:
  hosts:
    - "*.wikipedia.org"
  ports:
  - number: 443
    name: tls
    protocol: TLS

---
# Access logging for the gateway
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  accessLogging:
    - providers:
      - name: envoy

---
# And finally, the configuration of the SNI forwarder,
# it's internal listener, and the patch to the original Gateway
# listener to route everything into the SNI forwarder.
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: sni-magic
  namespace: istio-system
spec:
  configPatches:
  - applyTo: CLUSTER
    match:
      context: GATEWAY
    patch:
      operation: ADD
      value:
        name: sni_cluster
        load_assignment:
          cluster_name: sni_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  envoy_internal_address:
                    server_listener_name: sni_listener
  - applyTo: CLUSTER
    match:
      context: GATEWAY
    patch:
      operation: ADD
      value:
        name: dynamic_forward_proxy_cluster
        lb_policy: CLUSTER_PROVIDED
        cluster_type:
          name: envoy.clusters.dynamic_forward_proxy
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
            dns_cache_config:
              name: dynamic_forward_proxy_cache_config
              dns_lookup_family: V4_ONLY

  - applyTo: LISTENER
    match:
      context: GATEWAY
    patch:
      operation: ADD
      value:
        name: sni_listener
        internal_listener: {}
        listener_filters:
        - name: envoy.filters.listener.tls_inspector
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector

        filter_chains:
        - filter_chain_match:
            server_names:
            - "*.wikipedia.org"
          filters:
            - name: envoy.filters.network.sni_dynamic_forward_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.sni_dynamic_forward_proxy.v3.FilterConfig
                port_value: 443
                dns_cache_config:
                  name: dynamic_forward_proxy_cache_config
                  dns_lookup_family: V4_ONLY
            - name: envoy.tcp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
                stat_prefix: tcp
                cluster: dynamic_forward_proxy_cluster
                access_log:
                - name: envoy.access_loggers.file
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: "/dev/stdout"
                    log_format:
                      text_format_source:
                        inline_string: '[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%
                          %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS% %CONNECTION_TERMINATION_DETAILS%
                          "%UPSTREAM_TRANSPORT_FAILURE_REASON%" %BYTES_RECEIVED% %BYTES_SENT% %DURATION%
                          %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
                          "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" %UPSTREAM_CLUSTER%
                          %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_REMOTE_ADDRESS%
                          %REQUESTED_SERVER_NAME% %ROUTE_NAME%

                          '
  - applyTo: NETWORK_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: "envoy.filters.network.tcp_proxy"
    patch:
      operation: MERGE
      value:
        name: envoy.tcp_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: tcp
          cluster: sni_cluster
Check the istiod and gateway logs for any errors or warnings. If all went well, your mesh sidecars are now routing
*.wikipedia.org requests to your gateway pod while the gateway pod is then forwarding them to the exact remote host specified in the application
request.
Try it out
Following other Istio egress examples, we will use the
sleep pod as a test source for sending requests.
Assuming automatic sidecar injection is enabled in your default namespace, deploy
the test app using the following command:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/sleep/sleep.yaml
Get your sleep and gateway pods:
$ export SOURCE_POD=$(kubectl get pod -l app=sleep -o jsonpath={.items..metadata.name})
$ export GATEWAY_POD=$(kubectl get pod -n istio-egress -l istio=egressgateway -o jsonpath={.items..metadata.name})
Run the following command to confirm that you are able to connect to the wikipedia.org site:
$ kubectl exec "$SOURCE_POD" -c sleep -- sh -c 'curl -s https://en.wikipedia.org/wiki/Main_Page | grep -o ".*"; curl -s https://de.wikipedia.org/wiki/Wikipedia:Hauptseite | grep -o ".*"'
Wikipedia, the free encyclopedia
Wikipedia – Die freie Enzyklopädie
We could reach both English and German wikipedia.org subdomains, great!
Normally, in a production environment, we would block external requests that are not configured to redirect through the egress gateway, but since we didn’t do that in our test environment, let’s access another external site for comparison:
$ kubectl exec "$SOURCE_POD" -c sleep -- sh -c 'curl -s https://cloud.ibm.com/login | grep -o ".*"'
IBM Cloud
Since we have access logging turned on globally (with the Telemetry CR in the manifest), we can now inspect the logs to see how the above requests were handled by the proxies.
First, check the gateway logs:
$ kubectl logs -n istio-egress $GATEWAY_POD
[...]
[2023-11-24T13:21:52.798Z] "- - -" 0 - - - "-" 813 111152 55 - "-" "-" "-" "-" "185.15.59.224:443" dynamic_forward_proxy_cluster 172.17.5.170:48262 envoy://sni_listener/ envoy://internal_client_address/ en.wikipedia.org -
[2023-11-24T13:21:52.798Z] "- - -" 0 - - - "-" 1531 111950 55 - "-" "-" "-" "-" "envoy://sni_listener/" sni_cluster envoy://internal_client_address/ 172.17.5.170:8443 172.17.34.35:55102 outbound_.443_.wildcard_.egressgateway.istio-egress.svc.cluster.local -
[2023-11-24T13:21:53.000Z] "- - -" 0 - - - "-" 821 92848 49 - "-" "-" "-" "-" "185.15.59.224:443" dynamic_forward_proxy_cluster 172.17.5.170:48278 envoy://sni_listener/ envoy://internal_client_address/ de.wikipedia.org -
[2023-11-24T13:21:53.000Z] "- - -" 0 - - - "-" 1539 93646 50 - "-" "-" "-" "-" "envoy://sni_listener/" sni_cluster envoy://internal_client_address/ 172.17.5.170:8443 172.17.34.35:55108 outbound_.443_.wildcard_.egressgateway.istio-egress.svc.cluster.local -
There are four log entries, representing two of our three curl requests. Each pair shows how a single request flows through the envoy traffic processing pipeline.
They are printed in reverse order, but we can see the 2nd and the 4th line show that the requests arrived at the gateway service and were passed through the internal sni_cluster target.
The 1st and 3rd line show that the final target is determined from the inner SNI header, i.e., the target host set by the application.
The request is forwarded to dynamic_forward_proxy_cluster which finally sends on the request from Envoy to the remote target.
Great, but where is the third request to IBM Cloud? Let’s check the sidecar logs:
$ kubectl logs $SOURCE_POD -c istio-proxy
[...]
[2023-11-24T13:21:52.793Z] "- - -" 0 - - - "-" 813 111152 61 - "-" "-" "-" "-" "172.17.5.170:8443" outbound|443|wildcard|egressgateway.istio-egress.svc.cluster.local 172.17.34.35:55102 208.80.153.224:443 172.17.34.35:37020 en.wikipedia.org -
[2023-11-24T13:21:52.994Z] "- - -" 0 - - - "-" 821 92848 55 - "-" "-" "-" "-" "172.17.5.170:8443" outbound|443|wildcard|egressgateway.istio-egress.svc.cluster.local 172.17.34.35:55108 208.80.153.224:443 172.17.34.35:37030 de.wikipedia.org -
[2023-11-24T13:21:55.197Z] "- - -" 0 - - - "-" 805 15199 158 - "-" "-" "-" "-" "104.102.54.251:443" PassthroughCluster 172.17.34.35:45584 104.102.54.251:443 172.17.34.35:45582 cloud.ibm.com -
As you can see, Wikipedia requests were sent through the gateway while the request to IBM Cloud went straight out from the application pod to the internet, as indicated by the PassthroughCluster log.
Conclusion
We implemented controlled routing for egress HTTPS/TLS traffic using egress gateways, supporting arbitrary and wildcard domain names. In a production environment, the example shown in this post
would be extended to support HA requirements (e.g., adding zone aware gateway Deployments, etc.) and to restrict the direct external
network access of your application so that the application can only access the public network through the gateway, which is limited to a predefined set of remote hostnames.
The solution scales easily. You can include multiple domain names in the configuration, and they will be allow-listed as soon as you roll it out!
No need to configure per domain VirtualServices or other routing details. Be careful, however, as the domain names are listed in multiple places in the config. If you use
tooling for CI/CD (e.g., Kustomize), it’s best to extract the domain name list into a single place from which you can render into the required configuration resources.
That’s all! I hope this was helpful.
If you’re an existing user of the previous Nginx-based solution,
you can now migrate to this approach before upgrading to Istio 1.20, which will otherwise disrupt your current setup.
Happy SNI routing!
References

Envoy docs for the SNI forwarder
Previous solution with Nginx as an SNI proxy container in the gateway




Istio at KubeCon North America 2023
Thu, 16 Nov 2023 00:00:00 +0000
The open source and cloud native community gathered from the 6th to the 9th of November in Chicago for the final KubeCon of 2023. The four-day conference, organized by the Cloud Native Computing Foundation, was “twice the fun” for Istio, as we grew from a half-day event in Europe in April to a full day co-located event. To add to the excitement, Istio Day North America marked our first event as a CNCF graduated project.
With Istio Day NA over, that’s a wrap for our major community events for 2023. In case you missed them, Istio Day Europe was held in April, and alongside our Virtual IstioCon 2023 event, IstioCon China 2023 was held on September 26 in Shanghai, China.

    
        
            
        
    
    

Istio Day kicked off with an opening keynote from the Program Committee chairs, Faseela K and Zack Butcher. The keynote made sure to recognize the day-to-day efforts of our contributors, maintainers, release managers, and users, with some awards for our top contributors and community helpers. Rob Salmond and Andrea Ma were recognized for their selfless efforts in the Istio community, and the top 20 contributors in the last 6 months were also called out.

    
        
            
        
    
    Top 20 contributors who were in attendance were asked to come onto the stage

The opening keynote also announced the availability of the Istio Certified Associate (ICA) exam for enrollment starting November 6th.

    
        
            
        
    
    

We were also proud to showcase a small video of many of our contributors, vendors and end-users congratulating us for the CNCF graduation!



The keynote was followed by an end user talk by Kush Trivedi and Khushboo Mittal from DevRev about their usage of Istio. We had a much-awaited session on architecting ambient for scale from John Howard, which stirred some interesting discussions in the community. We also had an interesting talk showcasing the collaboration between Lilt and Intel about Scaling AI powered translation services using Istio.
After this we stepped into another end user talk from Intuit where Karim Lakhani explained about Intuit’s modern SaaS platform deploying multiple cloud native projects including Istio. The audience was excited when Mitch Connors and Christian Hernandez did a live demo of upgrading Istio ambient mesh with Argo on a live public site, with a publicly accessible availability monitor.

    
        
            
        
    
    Jam-packed sessions at Istio Day

The event witnessed more focus on security in subsequent talks with Jackie Elliot from Microsoft taking a dig into Istio Identity, followed by a lightning talk from Kush Mansing from Speedscale showing the impacts of running services with arbitrary code on Istio. We also had a lightning talk from Xiangfeng Zhu, a PhD student at the University of Washington, where he showcased a tool developed to analyze and predict the performance overhead of Istio.
The talk from the Kiali maintainers Jay Shaughnessy and Nick Fox, was very interesting, as it demonstrated many advanced ways of using Kiali for better debugging of Istio use cases. Ekansh Gupta from Zeta, and Nirupama Singh from Reskill pitched in another end user talk explaining the best practices while upgrading Istio in their production deployments.
Istio multi-cluster is always a hot topic, and Lukonde Mwila and Ovidiu from AWS nailed it in the talk on bridging trust between multi-cluster meshes.
We also had an interactive panel discussion with the Istio TOC Members, where a lot of questions came in from the audience, and the good attendance for the discussion was a testament to the continued popularity of Istio. Istio Day concluded with a brilliant workshop on getting started with ambient mesh from Christian Posta and Jim Barton from Solo.io, which is the hot topic all of the audience were looking forward to.
The slides for all the sessions can be found in the Istio Day NA 2023 schedule.

    
        
            
        
    
    Kush Trivedi and Khushboo Mittal from DevRev on stage

Our presence at the conference did not end with Istio Day. The first day keynote of KubeCon + CloudNativeCon started with a project update video from Mitch Connors. It was also a proud moment for us, when two of our contributors, Lin Sun and Faseela K, took home the prestigious CNCF community “Chop Wood Carry Water” award, presented by Chris Aniszczyk, CTO CNCF, at the second day keynote.

    
        
            
        
    
    Chop Wood Carry Water winners, Faseela K and Lin Sun (second and third from left)

Some of our maintainers and contributors made it to the CNCF Fall 2023 Ambassadors list as well, Lin Sun, Mitch Connors, and Faseela K, to name a few.

    
        
            
        
    
    The CNCF Ambassador group photo. Many Istio maintainers are in this picture!

The KubeCon maintainer track session for Istio, presented by TOC members John Howard and Louis Ryan,  grabbed great attention as they talked about the current ongoing efforts and future roadmap of Istio. The technologies described in the talk, and the resulting size of the audience, underlined why Istio continues to be the most popular service mesh in the industry.

    
        
            
        
    
    

The Contribfest Hands-on Development and Contribution Workshop by Lin Sun, Eric Van Norman, Steven Landow, and Faseela K was also well received. It was great to see so many people interested in contributing to Istio and pushing their first pull request at the end of the workshop.

    
        
            
        
    
    

A much-awaited panel discussion on Service Mesh Battle Scars: Technology, Timing and Tradeoffs, led by the maintainers from three CNCF Service Mesh projects, had a huge crowd in attendance, and a lot of interesting discussions.

    
        
            
        
    
    

Istio came up as a hot topic of discussion in several other KubeCon talks as well. Here are a few we noticed:

Take It to the Edge: Creating a Globally Distributed Ingress with Istio & K8gb
Under the Hood: Exploring Istio’s Lock Contention and Its Impact on Expedia’s Compute Platform
Untangling Your Service Mesh with Feature Gates
Flavors of Certificates in Service Mesh: The Whys and Hows!

Istio had a kiosk in the project pavilion, with the majority of questions asked being around the schedule for ambient mesh being production ready.

    
        
            
        
    
    Discussions at the Istio kiosk

We are glad that the major question which we had at the Istio kiosk in Europe — the schedule for CNCF graduation — has been answered, and we assured everyone that we are working on ambient mesh with the same level of seriousness.
Many of our members and maintainers offered support at our kiosk, helping us answer all the questions from our users.

    
        
            
        
    
    Members and maintainers at the Istio kiosk

Another highlight of our kiosk was that we had new Istio T-shirts sponsored by Microsoft, Solo.io, Stackgenie and Tetrate for everyone to grab!

    
        
            
        
    
    A new crop of Istio T-shirts

We would like to express our heartfelt gratitude to our platinum sponsors Google Cloud, for supporting Istio Day North America! Last but not least, we would like to thank our Istio Day Program Committee members, for all their hard work and support!
See you in Paris in March 2024!

    
        
            
        
    
    




Secure Application Communications with Mutual TLS and Istio
Tue, 17 Oct 2023 00:00:00 +0000
One of the biggest reasons users adopt service mesh is to enable secure communication
among applications using mutual TLS (mTLS) based on cryptographically verifiable
identities. In this blog, we’ll discuss the requirements of secure communication
among applications, how mTLS enables and meets all those requirements, along with
simple steps to get you started with enabling mTLS among your applications using Istio.
What do you need to secure the communications among your applications?
Modern cloud native applications are frequently distributed across multiple Kubernetes clusters or virtual machines. New versions are being staged frequently and they can rapidly scale up and down based on user requests. As modern applications gain resource utilization efficiency by not being dependent on co-location, it is paramount to be able to apply access policy to and secure the communications among these distributed applications due to increased multiple entry points resulting in a larger attack surface. To ignore this is to invite massive business risk from data loss, data theft, forged data, or simple mishandling.
The following are the common key requirements for secure communications between applications:
Identities
Identity is a fundamental component of any security architecture. Before your
applications can send their data securely, identities must be established for the
applications. This establishing an identity process is called identity validation - it
involves some well-known, trusted authority performing one or more
checks on the application workload to establish that it is what it claims to be. Once
the authority is satisfied, it grants the workload an identity.
Consider the act of being issued a passport - you will request one from some authority, that
authority will probably ask you for several different identity validations that prove you are
who you say you are - a birth certificate, current address, medical records, etc. Once you
have satisfied all the identity validations, you will (hopefully) be granted the identity
document. You can give that identity document to someone else as proof that you have satisfied
all the identity validation requirements of the issuing authority, and if they trust the
issuing authority (and the identity document itself), they can trust what it says about you (or they can contact the trusted authority and verify the document).
An identity could take any form, but, as with any form of identity document, the weaker the identity
validations are, the easier it is to forge, and the less useful that identity document is to anyone
using it to make a decision. That’s why, in computing, cryptographically verifiable identities are
so important - they are signed by a verifiable authority, similar to
your passport and driver’s license. Identities based around anything less are a security weakness
that is relatively easy to exploit.
Your system may have identities derived from network properties such as IP addresses with
distributed identity caches that track the mapping between identities and these network properties.
These identities don’t have strong guarantees as cryptographically verifiable
identities because IP addresses could be re-allocated to different workloads and identity caches may
not always be updated to the latest.
Using cryptographically verifiable identities for your applications is desired, because exchanging
cryptographically verifiable identities for applications during connection establishment is
inherently more reliable and secure than systems dependent on mapping IP addresses to identities.
These systems depend on distributed identity caches with eventual consistency and staleness issues
which could create a structural weakness in Kubernetes, where high rates of automated pod churn are
the norm.
Confidentiality
Encrypting the data transmitted among applications is critical - because in a world where breaches
are common, costly, and effectively trivial, relying entirely on secure internal environments or
other security perimeters has long since ceased to be adequate. To prevent a
man-in-the-middle attack, you require a unique encryption channel for a source-destination pair because you want a strong identity uniqueness guarantee to avoid confused deputy problems.
In other words, it is not enough to simply encrypt the channel - it must be encrypted using unique
keys directly derived from the unique source and destination identities so that only the source and
destination can decrypt the data. Further, you may need to customize the encryption, e.g. by
choosing specific ciphers, in accordance with what your security team requires.
Integrity
The encrypted data sent over the network from source to destination can’t be modified by any
identities other than the source and destination once it is sent. In other words, data received is
the same as data sent. If you don’t have data integrity,
someone in the middle could modify some bits or the entire content of the data during the
communication between the source and destination.
Access Policy Enforcement
Application owners need to apply access policies to their applications and have them enforced
properly, consistently, and unambiguously. In order to apply policy for both ends of a communication
channel, we need an application identity for each end. Once we have a cryptographically verifiable
identity with an unambiguous provenance chain for both ends of a potential communication channel, we
can begin to apply policies about who can communicate with what. Standard TLS, the widely used
cryptographic protocol that secures communication between clients (e.g., web browsers) and servers
(e.g., web servers), only really verifies and mandates an identity for one side - the server. But
for comprehensive end-to-end policy enforcement, it is critical to have a reliable, verifiable,
unambiguous identity for both sides - client and server. This is a common requirement for internal
applications - imagine for example a scenario where only a frontend application should call the
GET method for a backend checkout application, but should not be allowed to call the POST or
DELETE method. Or a scenario where only applications that have a JWT token issued by a particular
JWT issuer can call the GET method for a checkout application. By leveraging cryptographic
identities on both ends, we can ensure powerful access policies are enforced correctly, securely,
and reliably, with a validatable audit trail.
FIPS compliance
Federal Information Processing Standards (FIPS)
are standards and guidelines for federal computer systems that are developed by
National Institute of Standards and Technology (NIST). Not everyone
requires FIPS compliance, but FIPS compliance means meeting all the necessary security requirements
established by the U.S. government for protecting sensitive information. It is required when working
with the federal government. To follow the guidelines developed by the U.S. government relating to
cybersecurity, many in the private sector voluntarily use these FIPS standards.
To illustrate the above secure application requirements (identity, confidentiality and integrity),
let’s use the example that the frontend application calls the checkout application. Remember, you can think of ID in the diagram as any kind of identity document such as a government issued passport,
photo identifier:

    
        
            
        
    
    Requirements when the frontend calls the checkout application

How does mTLS satisfy the above requirements?
TLS 1.3 (the most recent TLS version at the time of writing) specification’s
primary goal is to provide a secure channel between two communicating peers.
The TLS secure channel has the following properties:

Authentication: the server side of the channel is always authenticated, the client side is
optionally authenticated. When the client is
also authenticated, the secure channel becomes a mutual TLS channel.
Confidentiality: Data is encrypted and only visible to the client and server.  Data must be
encrypted using keys that are unambiguously cryptographically bound to the source and destination
identity documents in order to reliably protect the application-layer traffic.
Integrity: data sent over the channel can’t be modified without detection. This is guaranteed by
the fact that only source and destination have the key to encrypt and decrypt the data for a given
session.

mTLS internals
We’ve established that cryptographically verifiable identities are key for securing channels and
supporting access policy enforcement, and we’ve established that mTLS is a battle-tested protocol
that mandates some extremely important guarantees for using cryptographically verifiable identities
on both ends of a channel - let’s get into some detail on how the mTLS protocol actually works under
the hood.
Handshake protocol
The handshake protocol authenticates the
communicating peers, negotiates cryptographic modes and parameters, and establishes shared keying
material. In other words, the role of the handshake is to verify the communicating peers’ identities
and negotiate a session key, so that the rest of the connection can be encrypted based on the
session key. When your applications make a mTLS connection, server and client negotiate a cipher
suite, which dictates what encryption algorithm your applications will use for the rest of the
connection and your applications also negotiate the cryptographic session key to use. The whole
handshake is designed to resist tampering - interference by any entities that do not possess the
same unique, cryptographically verifiable identity document as the source and/or destination will be
rejected. For this reason, it is important to check the whole handshake and verify its integrity
before any communicating peer continues with the application data.
The handshake can be thought of as having three phases per the
handshake protocol overview in the TLS 1.3
specification - again, let’s use the example of  a frontend application calling a backend
checkout application:

Phase 1: frontend and checkout negotiates the cryptographic parameters and encryption keys
that can be used to protect the rest of the handshake and traffic data.
Phase 2: everything in this phase and after are encrypted. In this phase, frontend and checkout establish other handshake parameters, and whether or not the client is also
authenticated - that is, mTLS.
Phase 3: frontend authenticates checkout via its cryptographically verifiable identity (and, in mTLS, checkout authenticates frontend in the same way).

There are a few major differences since TLS 1.2 related to handshake, refer to the TLS 1.3 specification for more details:

All handshake messages (phase 2 and 3) are encrypted using the encryption keys negotiated in phase 1.
Legacy symmetric encryption algorithms have been pruned.
A zero round-trip time (0-RTT) mode was added, saving a round trip at connection setup.

Record protocol
Having negotiated the TLS protocol version, session-key & HMAC
during the handshake phase, the peers can now securely exchange encrypted data that is chunked by the record protocol. It is critical (and
required as part of the spec) to use the exact same negotiated parameters from the handshake to
encrypt the traffic to ensure the traffic confidentiality and integrity.
Putting the two protocols from the TLS 1.3 specification together and using the frontend and
checkout applications to illustrate the flow as below:

    
        
            
        
    
    mTLS flows when the frontend calls the checkout application

Who issues the identity certificates for frontend and checkout? They are commonly issued by a
certificate authority (CA) which either has
its own root certificate or uses an intermediate
certificate from its root CA. A root certificate is basically a public key certificate that
identifies a root CA, which you likely already have in your organization. The root certificate is
distributed to frontend (or checkout) in addition to its own root-signed identity certificate. This is how
everyday, basic Public Key Infrastructure (PKI) works - a CA has responsibility for validating an
entity’s identity document, and then grants it an unforgeable identity document in the form of a
certificate.
You can rely on your CA and intermediate CAs as source of identity truth in a structural fashion
that maintains high availability and stable, persistently-verifiable identity guarantees in a way
that a massive distributed cache of IP and identity maps simply cannot. When the frontend and
checkout identity certificates are issued by the same root certificate, frontend and checkout
can verify their peer identities consistently and reliably regardless of which cluster or nodes or scale
they run.
You learned about how mTLS provides cryptographic identity, confidentiality and integrity, what
about scalability as you grow to thousands or more applications among multiple clusters? If you
establish a single root certificate across multiple clusters, the system doesn’t need to care when
your application gets a connection request from another cluster as long as it is trusted by the root
certificate - the system knows the identity on the connection is cryptographically verified. As your
application pod changes IP or is redeployed to a different cluster or network, your application (or
component acting on behalf of it) simply originates the traffic with its trusted certificate minted
by the CA to the destination. It can be 500+ network hops or can be direct; your access policies for
your application are enforced in the same fashion regardless of the topology, without needing to
keep track of the identity cache and calculate which IP address maps to which application pod.
What about FIPS compliance? Per TLS 1.3 specification, TLS-compliant applications must implement the
TLS_AES_128_GCM_SHA256 cipher suite, and are recommended to implement TLS_AES_256_GCM_SHA384, both
of which are also in the guidelines for TLS
by NIST. RSA or ECDSA server certificates are also recommended by both TLS 1.3 specification and
NIST’s guideline for TLS. As long as you use mTLS and FIPS 140-2 or 140-3 compliant cryptographic
modules for your mTLS connections, you will be on the right path for
FIPS 140-2 or 140-3 validation.
What could go wrong
It is critical to implement mTLS exactly as the TLS 1.3 specification dictates. Without using
proper mTLS following the TLS specification, here are a few things that can go wrong without
detection:
What if someone in the middle of the connection silently captures the encrypted data?
If the connection doesn’t follow exactly the handshake and record protocols as outlined in the TLS
specification, for example, the connection follows the handshake protocol but not using the
negotiated session key and parameters from the handshake in the record protocol, you may have your
connection’s handshake unrelated to the record protocol where identities could be different between
the handshake and record protocols. TLS requires that the handshake and record protocols share the same connection because separating them increases the attack surface for man-in-the-middle attacks.
A mTLS connection has a consistent end-to-end security from start of the handshake to finish. The
encrypted data is encrypted with the session key negotiated using the public key in the
certificate. Only the source and destination can decrypt the data with the private key. In other
words, only the owner of the certificate who has the private key can decrypt the data.  Unless a
hacker has control of the private key of the certificate, he or she doesn’t have a way to mess
around with the mTLS connection to successfully execute a man-in-the-middle attack.
What if either source or destination identity is not cryptographically secure?
If the identity is based on network properties such as IP address, which could be re-allocated to
other pods, the identity can’t be validated using cryptographic techniques. Since this type of
identity isn’t based on cryptographic identity, your system likely has an identity cache to track
the mapping between the identity, the pod’s network labels, the corresponding IP address and the
Kubernetes node info where the pod is deployed. With an identity cache, you could run into pod IP
addresses being reused and identity mistaken where policy isn’t enforced properly when the identity
cache gets out of sync for a short period of time. For example, if you don’t have cryptographic
identity on the connection between the peers, your system would have to get the identity from the
identity cache which could be outdated or incomplete.
These identity caches that map identity to workload IPs are not ACID
(Atomicity, Consistency, Isolation, and Durability) and you want your security system to be applied
to something with strong guarantees. Consider the following properties and questions you may want
to ask yourself:

Staleness: How can a peer verify that an entry in the cache is current?
Incompleteness: If there’s a cache miss and the system fails to close the connection, does the
network become unstable when it’s only the cache synchronizer that is failing?
What if something simply doesn’t have an IP? For example, an AWS Lambda service doesn’t by
default have a public IP.
Non-transactional: If you read the identity twice will you see the same value? If you are not
careful in your access policy or auditing implementation this can cause real issues.
Who will guard the guards themselves? Are there established practices to protect
the cache like a CA has? What proof do you have that the cache has not been tampered with? Are you
forced to reason about (and audit) the security of some complex infrastructure that is not your CA?

Some of the above are worse than others. You can apply the failing closed principle but that does not solve all of the above.
Identities are also used in enforcing access policies such as authorization policy, and these
access policies are in the request path where your system has to make decisions fast to allow or
deny the access. Whenever identities become mistaken, access policies could be bypassed without
being detected or audited. For example, your identity cache may have your checkout pod’s prior
allocated IP address associated as one of the checkout identities. If the checkout pod gets
recycled and the same IP address is just allocated to one of the frontend pods, that frontend pod could have the checkout’s identity before the cache is updated, which could cause wrong access
policies to be enforced.
Let us illustrate the identity cache staleness problem assuming the following large scale multi-cluster deployment:

100 clusters where each cluster has 100 nodes with 20 pods per node. The number of total pods is 200,000.
0.25% of pods are being churned at all times (rollout, restarts, recovery, node churn, …), each churn is a 10 second window.
500 pods which are being churned are distributed to 10,000 nodes (caches) every 10 secs
If the cache synchronizer stalls what % stale is the system after 5 minutes - potentially as high as 7.5%!

Above assumes the cache synchronizer is in a steady state. If cache synchronizer has a brown-out it would affect its health-checking which increases churn rate, leading to cascading instability.
CA could also be compromised
by an attacker who claims to present someone else and trick the CA to issue a certificate. The
attacker can then use that certificate to communicate with other peers. This is where
certificate revocation can remediate the situation by revoking the
certificate so it is no longer valid. Otherwise the attacker can exploit the compromised
certificate till expiry. It is critical to keep the private key for the root certificates in an HSM
that is kept offline and use intermediate
certificates for signing workload certificates. In the event when CA is brown-out or stalled for 5
minutes, you won’t be able to obtain new or renewed workload certificates but the previously issued
and valid certificates continue to provide strong identity guarantees for your workloads. For
increased reliability for issuance, you can deploy Intermediate CAs to different zones and regions.
mTLS in Istio
Enable mTLS
Enabling mTLS in Istio for intra-mesh applications is very simple. All you need is to add your
applications to the mesh, which can be done by labeling your namespace for either sidecar injection
or ambient. In the case of sidecar, a rollout restart would be required for sidecar to be injected
to your application pods.
Cryptographic identity
In Kubernetes environment, Istio
creates an application’s identity based on its service account. Identity certificate is provided to
each application pod in the mesh after you add your application to the mesh.
By default, your pod’s identity certificate expires in 24 hours and Istio rotates the pod identity
certificate every 12 hours so that in the event of a compromise (for example, compromised CA or
stolen private key for the pod), the compromised certificate only works for a very limited period
of time until the certificate expires and therefore limit the
damage it can cause.
Enforce strict mTLS
The default mTLS behavior is mTLS whenever possible but not strictly enforced. To strictly enforce
your application to accept only mTLS traffic, you can use Istio’s
PeerAuthentication policy, mesh-wide or
per namespace or workload. In addition, you can also apply Istio’s
AuthorizationPolicy to control access for your workloads.
TLS version
TLS version 1.3 is the default in Istio for intra-mesh application communication with the Envoy’s
default cipher suites
(for example TLS_AES_256_GCM_SHA384 for Istio 1.19.0). If you need an older TLS version, you can
configure a different mesh-wide minimum TLS protocol version for your workloads.
Wrapping up
The TLS protocol, as established by the Internet Engineering Task Force (IETF), is one of the most
widely-reviewed, expert-approved, battle-tested data security protocols in existence. TLS is also
widely used globally - whenever you visit any secured website, you shop with confidence partly
because of the padlock icon to indicate that you are securely connected to a trusted site
by using TLS. The TLS 1.3 protocol was designed with end-to-end authentication,
confidentiality, and integrity to ensure your application’s identity and communications are not
compromised, and to prevent man-in-the-middle attacks. In order to achieve that (and to be
considered standards-compliant TLS), it is not only important to properly authenticate the
communicating peers but also critical to encrypt the traffic using the keys established from the
handshake. Now that you know mTLS excels at satisfying your secure application communication
requirements (cryptographic identities, confidentiality, integrity and access policy enforcement),
you can simply use Istio to upgrade your intra-mesh application communication with mTLS out of the
box - with very little configuration!
Huge thanks to Louis Ryan, Ben Leggett, John Howard, Christian Posta, Justin Pettit who
contributed significant time in reviewing and proposing updates to the blog!



IstioCon China 2023 wrap-up
Fri, 29 Sep 2023 00:00:00 +0000
It’s great to be able to safely get together in person again.  After two years of only running virtual events, we have filled the calendar for 2023. Istio Day Europe was held in April, and Istio Day North America is coming this November.
IstioCon is committed to the industry-leading service mesh that provides a platform to explore insights gained from real-world Istio deployments, engage in interactive hands-on activities, and connect with maintainers across the entire Istio ecosystem.
Alongside our virtual IstioCon 2023 event, IstioCon China 2023 was held on September 26 in Shanghai, China. Part of the KubeCon + CloudNativeCon + Open Source Summit China, the event was arranged and hosted by the Istio maintainers and the CNCF. We were very proud to have a strong program for IstioCon in Shanghai and pleased to bring together members of the Chinese Istio community. The event was a testament to Istio’s immense popularity in the Asia-Pacific ecosystem.

    
        
            
        
    
    IstioCon China 2023

IstioCon China kicked off with an opening keynote from Program Committee members Jimmy Song and Zhonghu Xu. The event was packed with great content, ranging from new features to end user talks, with major focus on the new Istio ambient mesh.

    
        
            
        
    
    IstioCon China 2023, Welcome

The welcome speech was followed by a sponsored keynote from Justin Pettit from Google, on “Istio Ambient Mesh as a Managed Infrastructure” which highlighted the importance and priority of the ambient model in the Istio community, especially for our top supporters like Google Cloud.

    
        
            
        
    
    IstioCon China 2023, Google Cloud Sponsored Keynote

Perfectly placed after the keynote, Huailong Zhang from Intel and Yuxing Zeng from Alibaba discussed configurations for the co-existence of Ambient and Sidecar: a very relevant topic for existing users who want to experiment with the new ambient model.

    
        
            
        
    
    IstioCon China 2023, Deep Dive into Istio Network Flows and Configurations for the co-existence of Ambient and Sidecar

Huawei’s new Istio data plane based on eBPF intends to implement the capabilities of L4 and L7 in the kernel,to avoid kernel-state and user-mode switching and reduce the latency of the data plane. This was explained by an interesting talk from Xie SongYang and Zhonghu Xu. Chun Li and Iris Ding from Intel also integrated eBPF with Istio, with their talk “Harnessing eBPF for Traffic Redirection in Istio ambient mode”, leading to more interesting discussions. DaoCloud also had a presence at the event, with Kebe Liu sharing Merbridge’s innovation in eBPF and Xiaopeng Han presenting about MirageDebug for localized Istio development.

    
        
            
        
    
    

The talk from Tetrate’s Jimmy Song, about the perfect union of different GitOps and Observability tools, was also very well received. Chaomeng Zhang from Huawei presented on how cert-manager helps enhance the security and flexibility of Istio’s certificate management system, and Xi Ning Wang and Zehuan Shi from Alibaba Cloud shared the idea of using VK (Virtual Kubelet) to implement serverless mesh.
While Shivanshu Raj Shrivastava gave a perfect introduction to WebAssembly through his talk “Extending and Customizing Istio with Wasm”, Zufar Dhiyaulhaq from GoTo Financial, Indonesia shared the practice of using Coraza Proxy Wasm to extend Envoy and quickly implement custom Web Application Firewalls.
Huabing Zhao from Tetrate shared Aeraki Mesh’s Dubbo service governance practices with Qin Shilin from Boss Direct. While multi-tenancy is always a hot topic with Istio, John Zheng from HP described in detail about multi-tenant management in HP OneCloud Platform.
The slides for all the sessions can be found in the IstioCon China 2023 schedule and all the presentations will be available in the CNCF YouTube Channel soon for the audience in other parts of the world.
On the show floor
Istio had a full time kiosk in the project pavilion at KubeCon + CloudNativeCon + Open Source Summit China 2023 , with the majority of questions asked around ambient mesh. Many of our members and maintainers offered support at the booth, where a lot of interesting discussions happened.

    
        
            
        
    
    KubeCon + CloudNativeCon + Open Source Summit China 2023, Istio Kiosk

Another highlight was the Istio Steering Committee members and authors of the Istio books “Cloud Native Service Mesh Istio” and “Istio: the Definitive Guide”, Zhonghu Xu and Chaomeng Zhang, spent time at the Istio booth interacting with our users and contributors.

    
        
            
        
    
    Meet the Authors

We would like to express our heartfelt gratitude to our diamond sponsors Google Cloud, for supporting IstioCon 2023!

    
        
            
        
    
    IstioCon 2023, Our Diamond Sponsor

Last but not least, we would like to thank our IstioCon China Program Committee members for all their hard work and support!

    
        
            
        
    
    IstioCon China 2023, Program Committee Members (Not Pictured: Iris Ding)

See you all in Chicago in November!



Deep Dive into the Network Traffic Path of the Coexistence of Ambient and Sidecar
Mon, 18 Sep 2023 00:00:00 +0000

    
        
            
        
        Ambient mode now uses in-Pod redirection to redirect traffic between workload pods and ztunnel. The method described in this blog is no longer needed, and this post has been left for historical interest.
    


There are 2 deployment modes for Istio: ambient mode and sidecar mode. The former is still on the way, the latter is the classic one. Therefore, the coexistence of ambient mode and sidecar mode should be a normal deployment form and the reason why this blog may be helpful for Istio users.
Background
In the architecture of modern microservices, communication and management among services is critical. To address the challenge, Istio emerged as a service mesh technology. It provides traffic control, security, and superior observation capabilities by utilizing the sidecar. In order to further improve the adaptability and flexibility of Istio, the Istio community began to explore a new mode - ambient mode. In this mode, Istio no longer relies on explicit sidecar injection, but achieves communication and mesh management among services through ztunnel and waypoint proxies. Ambient also brings a series of improvements, such as lower resource consumption, simpler deployment, and more flexible configuration options. When enabling ambient mode, we don’t have to restart pods anymore which enables Istio to play a better role in various scenarios.
There are many blogs, which can be found in the Reference Resources section of this blog, that introduce and analyze ambient, and this blog will analyze the network traffic path in Istio ambient and sidecar modes.
To clarify the network traffic paths and make it easier to understand, this blog post explores the following two scenarios with corresponding diagrams:

The network path of services in ambient mode to services in sidecar mode
The network path of services in sidecar mode to services in ambient mode

Information about the analysis
The analysis is based on Istio 1.18.2, where ambient mode uses iptables for redirection.
Ambient mode sleep to sidecar mode httpbin
Deployment and configuration for the first scenario

sleep is deployed in namespace foo

sleep pod is scheduled to Node A


httpbin is deployed in namespace bar

httpbin is scheduled to Node B


foo namespace enables ambient mode (foo namespace contains label: istio.io/dataplane-mode=ambient)
bar namespace enables sidecar injection (bar namespace contains label: istio-injection: enabled)

With the above description, the deployment and network traffic paths are:

    
        
            
        
    
    Ambient mode sleep to Sidecar mode httpbin

ztunnel will be deployed as a DaemonSet in istio-system namespace if ambient mode is enabled, while istio-cni and ztunnel would generate iptables rules and routes for both the ztunnel pod and pods on each node.
All network traffic coming in/out of the pod with ambient mode enabled will go through ztunnel based on the network redirection logic. The ztunnel will then forward the traffic to the correct endpoints.
Network traffic path analysis of ambient mode sleep to sidecar mode httpbin
According to above diagram, the details of network traffic path is demonstrated as below:
(1) (2) (3) Request traffic of the sleep service is sent out from the veth of the sleep pod where it will be marked and forwarded to the istioout device in the node by following the iptables rules and route rules. The istioout device on node A is a Geneve tunnel, and the other end of the tunnel is pistioout, which is inside the ztunnel pod on the same node.
(4) (5) When the traffic arrives through the pistioout device, the iptables rules inside the pod intercept and redirect it through the eth0 interface in the pod to port 15001.
(6) According to the original request information, ztunnel can obtain the endpoint list of the target service. It will then handle sending the request to the endpoint, such as one of the httpbin pods. Finally, the request traffic would get into the httpbin pod via the container network.
(7) The request traffic arriving in httpbin pod will be intercepted and redirected through port 15006 of the sidecar by its iptables rules.
(8) Sidecar handles the inbound request traffic coming in via port 15006, and forwards the traffic to the httpbin container in the same pod.
Sidecar mode sleep to ambient mode httpbin and helloworld
Deployment and configuration for the second scenario

sleep is deployed in namespace foo

sleep pod is scheduled to Node A


httpbin deployed in namespace bar-1

httpbin pod is scheduled to Node B
the waypoint proxy of httpbin is disabled


helloworld is deployed in namespace bar-2

helloworld pod is scheduled to Node D
the waypoint proxy of helloworld is enabled
the waypoint proxy is scheduled to Node C


foo namespace enables sidecar injection (foo namespace contains label: istio-injection: enabled)
bar-1 namespace enables ambient mode (bar-1 namespace contains label: istio.io/dataplane-mode=ambient)

With the above description, the deployment and network traffic paths are:

    
        
            
        
    
    sleep to httpbin and helloworld

Network traffic path analysis of sidecar mode sleep to ambient mode httpbin
Network traffic path of a request from the sleep pod (sidecar mode) to the httpbin pod (ambient mode) is depicted in the top half of the diagram above.
(1) (2) (3) (4) the sleep container sends a request to httpbin. The request is intercepted by iptables rules and directed to port 15001 on the sidecar in the sleep pod. Then, the sidecar handles the request and routes the traffic based on the configuration received from istiod (control plane) forwarding the traffic to an IP address corresponding to the httpbin pod on node B.
(5) (6) After the request is sent to the device pair (veth httpbin <-> eth0 inside httpbin pod), the request is intercepted and forwarded using the iptables and route rules to the istioin device on node B where httpbin pod is running by following its iptables and route rules. The istioin device on node B and the pistion device inside the ztunnel pod on the same node are connected by a Geneve tunnel.
(7) (8) After the request enters the pistioin device of the ztunnel pod, the iptables rules in the ztunnel pod intercept and redirect the traffic through port 15008 on the ztunnel proxy running inside the pod.
(9) The traffic getting into the port 15008 would be considered as a inbound request, and the ztunnel will then forward the request to the httpbin pod in the same node B.
Network traffic path analysis of sidecar mode sleep to ambient mode httpbin via waypoint proxy
Comparing with the top part of the diagram, the bottom part inserts a waypoint proxy in the path between the sleep, ztunnel and httpbin pods. The Istio control plane has all the service information and configuration of the service mesh. When helloworld pod is deployed with a waypoint proxy, the EDS configuration of helloworld service received by the sleep pod sidecar will be changed to the type of envoy_internal_address. This causes that the request traffic going through the sidecar to be forwarded to port 15008 of the waypoint proxy on node C via the HTTP Based Overlay Network (HBONE) protocol.
Waypoint proxy is an instance of Envoy proxy and forwards the request to the helloworld pod based on the routing configuration received from the control plane. Once traffic reaches the veth on node D, it follows the same path as the previous scenario.
Wrapping up
The sidecar mode is what made Istio a great service mesh. However, the sidecar mode can also cause problems as it requires the app and sidecar containers to run in the same pod. Istio ambient mode implements communication among services through centralized proxies (ztunnel and waypoint). The ambient mode provides greater flexibility and scalability, reduces resource consumption as it doesn’t require a sidecar for each pod in the mesh, and allows more precise configuration. Therefore, there’s no doubt ambient mode is the next evolution of Istio. It’s obvious that the coexistence of sidecar and ambient modes may be last a very long time, although the ambient mode is still in alpha stage and the sidecar mode is still the recommended mode of Istio, it will give users a more light-weight option of running and adopting the Istio service mesh as the ambient mode moves towards beta and future releases.
Reference Resources

Traffic in ambient mesh: Istio CNI and node configuration
Traffic in ambient mesh: Redirection using iptables and Geneve tunnels
Traffic in ambient mesh: ztunnel, eBPF configuration, and waypoint proxies




Istio Announces Winners of 2023 Steering Committee Election
Wed, 16 Aug 2023 00:00:00 +0000
The Istio Steering Committee is pleased to announce the four winners of the 2023 election for Community Seats. The winners are:

Craig Box, ARMO
Iris Ding, Intel
Lin Sun, Solo.io
Faseela K, Ericsson Software Technology

The winners will serve on the Steering Committee for one year, starting on September 1, 2023. They will be responsible for helping to guide the development and governance of Istio, the world’s most popular service mesh.
The election was held in August 2023, and was open to any member of the Istio community who submitted a pull request or made other significant project contributions. Over 120 eligible voters evaluated the candidates on their contributions to Istio, their experience in open source governance, and their commitment to the project’s mission.
In addition to the four Community Seats, the Istio Steering Committee also consists of nine Contribution Seats, which are awarded proportionally to organizations which made significant contributions to the project. The Contribution Seats for 2023 are held by:

Google
IBM / Red Hat
Huawei

The Steering Committee congratulates the winners of the election, and looks forward to working with them to continue to grow and improve Istio as a successful and sustainable open source project. We encourage everyone to get involved in the Istio community, contribute, vote, and help us shape the future of service mesh.



Kubernetes Native Sidecars in Istio
Tue, 15 Aug 2023 00:00:00 +0000
If you have heard anything about service meshes, it is that they work using the sidecar pattern: a proxy server is deployed alongside your application code.
The sidecar pattern is just that: a pattern.
Up until this point, there has been no formal support for sidecar containers in Kubernetes at all.
This has caused a number of problems: what if you have a job that terminates by design, but a sidecar container that doesn’t?
This exact use case is the most popular ever on the Kubernetes issue tracker.
A formal proposal for adding sidecar support in Kubernetes was raised in 2019. With many stops and starts along the way,
and after a reboot of the project last year, formal support for sidecars is being released to Alpha in Kubernetes 1.28.
Istio has implemented support for this feature, and in this post you can learn how to take advantage of it.
Sidecar woes
Sidecar containers give a lot of power, but come with some issues.
While containers within a pod can share some things, their lifecycle’s are entirely decoupled.
To Kubernetes, both of these containers are functionally the same.
However, in Istio they are not the same - the Istio container is required for the primary application container to run,
and has no value without the primary application container.
This mismatch in expectation leads to a variety of issues:

If the application container starts faster than Istio’s container, it cannot access the network.
This wins the most +1’s on Istio’s GitHub by a landslide.
If Istio’s container shuts down before the application container, the application container cannot access the network.
If an application container intentionally exits (typically from usage in a Job), Istio’s container will still run and keep the pod running indefinitely.
This is also a top GitHub issue.
InitContainers, which run before Istio’s container starts, cannot access the network.

Countless hours have been spent in the Istio community and beyond to work around these issues - to limited success.
Fixing the root cause
While increasingly-complex workarounds in Istio can help alleviate the pain for Istio users, ideally all of this would just work - and not just for Istio.
Fortunately, the Kubernetes community has been hard at work to address these directly in Kubernetes.
In Kubernetes 1.28, a new feature to add native support for sidecars was merged, closing out over 5 years of ongoing work.
With this merged, all of our issues can be addressed without workarounds!
While we are on the “GitHub issue hall of fame”, these two issues account for #1 and #6 all time issues in Kubernetes - and have finally been closed!
A special thanks goes to the huge group of individuals involved in getting this past the finish line.
Trying it out
While Kubernetes 1.28 was just released, the new SidecarContainers feature is Alpha (and therefore, off by default), and the support for the feature in Istio is not yet shipped, we can still try it out today - just don’t try this in production!
First, we need to spin up a Kubernetes 1.28 cluster, with the SidecarContainers feature enabled:
$ cat <

Then we can download the latest Istio 1.19 pre-release (as 1.19 is not yet out). I used Linux here.
This is a pre-release of Istio, so again - do not try this in production!
When we install Istio, we will enable the feature flag for native sidecar support and turn on access logs to help demo things later.
$ TAG=1.19.0-beta.0
$ curl -L https://github.com/istio/istio/releases/download/$TAG/istio-$TAG-linux-amd64.tar.gz | tar xz
$ ./istioctl install --set values.pilot.env.ENABLE_NATIVE_SIDECARS=true -y --set meshConfig.accessLogFile=/dev/stdout
And finally we can deploy a workload:
$ kubectl label namespace default istio-injection=enabled
$ kubectl apply -f samples/sleep/sleep.yaml
Let’s look at the pod:
$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
sleep-7656cf8794-8fhdk   2/2     Running   0          51s
Everything looks normal at first glance…
If we look under the hood, we can see the magic, though.
$ kubectl get pod -o "custom-columns="\
"NAME:.metadata.name,"\
"INIT:.spec.initContainers[*].name,"\
"CONTAINERS:.spec.containers[*].name"

NAME                     INIT                     CONTAINERS
sleep-7656cf8794-8fhdk   istio-init,istio-proxy   sleep
Here we can see all the containers and initContainers in the pod.
Surprise! istio-proxy is now an initContainer.
More specifically, it is an initContainer with restartPolicy: Always set (a new field, enabled by the SidecarContainers feature).
This tells Kubernetes to treat it as a sidecar.
This means that later containers in the list of initContainers, and all normal containers will not start until the proxy container is ready.
Additionally, the pod will terminate even if the proxy container is still running.
Init container traffic
To put this to the test, let’s make our pod actually do something.
Here we deploy a simple pod that sends a request in an initContainer.
Normally, this would fail.
apiVersion: v1
kind: Pod
metadata:
  name: sleep
spec:
  initContainers:
  - name: check-traffic
    image: istio/base
    command:
    - curl
    - httpbin.org/get
  containers:
  - name: sleep
    image: istio/base
    command: ["/bin/sleep", "infinity"]
Checking the proxy container, we can see the request both succeeded and went through the Istio sidecar:
$ kubectl logs sleep -c istio-proxy | tail -n1
[2023-07-25T22:00:45.703Z] "GET /get HTTP/1.1" 200 - via_upstream - "-" 0 1193 334 334 "-" "curl/7.81.0" "1854226d-41ec-445c-b542-9e43861b5331" "httpbin.org" ...
If we inspect the pod, we can see our sidecar now runs before the check-traffic initContainer:
$ kubectl get pod -o "custom-columns="\
"NAME:.metadata.name,"\
"INIT:.spec.initContainers[*].name,"\
"CONTAINERS:.spec.containers[*].name"

NAME    INIT                                  CONTAINERS
sleep   istio-init,istio-proxy,check-traffic   sleep
Exiting pods
Earlier, we mentioned that when applications exit (common in Jobs), the pod would live forever.
Fortunately, this is addressed as well!
First we deploy a pod that will exit after one second and doesn’t restart:
apiVersion: v1
kind: Pod
metadata:
  name: sleep
spec:
  restartPolicy: Never
  containers:
- name: sleep
  image: istio/base
  command: ["/bin/sleep", "1"]
And we can watch its progress:
$ kubectl get pods -w
NAME    READY   STATUS     RESTARTS   AGE
sleep   0/2     Init:1/2   0          2s
sleep   0/2     PodInitializing   0          2s
sleep   1/2     PodInitializing   0          3s
sleep   2/2     Running           0          4s
sleep   1/2     Completed         0          5s
sleep   0/2     Completed         0          12s
Here we can see the application container exited, and shortly after Istio’s sidecar container exits as well.
Previously, the pod would be stuck in Running, while now it can transition to Completed.
No more zombie pods!
What about ambient mode?
Last year, Istio announced ambient mode - a new data plane mode for Istio that doesn’t rely on sidecar containers.
So with ambient mode coming, does any of this even matter?
I would say a resounding “Yes”!
While the impacts of sidecar are lessened when ambient mode is used for a workload, I expect that almost all large scale Kubernetes users have some sort of sidecar in their deployments.
This could be Istio workloads they don’t want to migrate to ambient, that they haven’t yet migrated, or things unrelated to Istio.
So while there may be fewer scenarios where this matters, it still is a huge improvement for the cases where sidecars are used.
You may wonder the opposite - if all our sidecar woes are addressed, why do we need ambient mode at all?
There are still a variety of benefits ambient brings with these sidecar limitations addressed.
For example, this blog post goes into details about why decoupling proxies from workloads is advantageous.
Try it out yourself
We encourage the adventurous readers to try this out themselves in testing environments!
Feedback for these experimental and alpha features is critical to ensure they are stable and meeting expectations before promoting them.
If you try it out, let us know what you think in the Istio Slack!
In particular, the Kubernetes team is interested in hearing more about:

Handling of shutdown sequence, especially when there are multiple sidecars involved.
Backoff restart handling when sidecar containers are crashing.
Edge cases they have not yet considered.




Using Accelerated Offload Connection Load Balancing in Istio
Tue, 08 Aug 2023 00:00:00 +0000
What is connection load balancing?
Load balancing is a core networking solution used to distribute traffic across multiple servers in a server farm.
Load balancers improve application availability and responsiveness and prevent server overload. Each load balancer
sits between client devices and backend servers, receiving and then distributing incoming requests to any available server capable of fulfilling them.
For a common web server, it usually has multiple workers (processors or threads). If many clients connect to
a single worker, this worker becomes busy and brings long tail latency while other workers run in the free state,
affecting the performance of the web server. Connection load balancing is the solution for this situation,
which is also known as connection balancing.
What does Istio do for connection load balancing?
Istio uses Envoy as the data plane.
Envoy provides a connection load balancing implementation called Exact connection balancer. As its name says, a lock is held during balancing so that connection counts are nearly exactly balanced between workers. It is “nearly” exact in the sense that a connection might close in parallel thus making the counts incorrect, but this should be rectified on the next accept. This balancer sacrifices accept throughput for accuracy and should be used when there are a small number of connections that rarely cycle, e.g., service mesh gRPC egress.
Obviously, it is not suitable for an ingress gateway since an ingress gateway accepts thousands of connections within a short time, and the resource cost from the lock brings a big drop in throughput.
Now, Envoy has integrated Intel® Dynamic Load Balancing (Intel®DLB) connection load balancing to accelerate in high connection count cases like ingress gateway.
How Intel® Dynamic Load Balancing accelerates connection load balancing in Envoy
Intel DLB is a hardware managed system of queues and arbiters connecting producers and consumers. It is a PCI device envisaged to live in the server CPU uncore and can interact with software running on cores, and potentially with other devices.
Intel DLB implements the following load balancing features:

Offloads queue management from software — useful where there are significant queuing-based costs.

Especially with multi-producer / multi-consumer scenarios and enqueue batching to multiple destinations.
The overhead locks are required to access shared queues in the software. Intel DLB implements lock-free access to shared queues.


Dynamic, flow aware load balancing and reordering.

Ensures equal distribution of tasks and better CPU core utilization. Can provide flow-based atomicity if required.
Distributes high bandwidth flows across many cores without loss of packet order.
Better determinism and avoids excessive queuing latencies.
Uses less IO memory footprint and saves DDR Bandwidth.


Priority queuing (up to 8 levels) — allows for QOS.

Lower latency for traffic that is latency sensitive.
Optional delay measurements in the packets.


Scalability

Allows dynamic sizing of applications, seamless scale up/down.
Power aware; application can drop workers to lower power state in cases of lighter load.



There are three types of load balancing queues:

Unordered: For multiple producers and consumers. The order of tasks is not important, and each task is assigned to the processor core with the lowest current load.
Ordered: For multiple producers and consumers where the order of tasks is important. When multiple tasks are processed by multiple processor cores, they must be rearranged in the original order.
Atomic: For multiple producers and consumers, where tasks are grouped according to certain rules. These tasks are processed using the same set of resources and the order of tasks within the same group is important.

An ingress gateway is expected to process as much data as possible as quickly as possible, so Intel DLB connection load balancing uses an unordered queue.
How to use Intel DLB connection load balancing in Istio
With the 1.17 release, Istio officially supports Intel DLB connection load balancing.
The following steps show how to use Intel DLB connection load balancing in an Istio Ingress Gateway in an SPR (Sapphire Rapids) machine, assuming the Kubernetes cluster is running.
Step 1: Prepare DLB environment
Install the Intel DLB driver by following the instructions on the Intel DLB driver official site.
Install the Intel DLB device plugin with the following command:
$ kubectl apply -k https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/dlb_plugin?ref=v0.26.0
For more details about the Intel DLB device plugin, please refer to Intel DLB device plugin homepage.
You can check the Intel DLB device resource:
$ kubectl describe nodes | grep dlb.intel.com/pf
  dlb.intel.com/pf:   2
  dlb.intel.com/pf:   2
...
Step 2: Download Istio
In this blog we use 1.17.2. Let’s download the installation:
$ curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.17.2 TARGET_ARCH=x86_64 sh -
$ cd istio-1.17.2
$ export PATH=$PWD/bin:$PATH

    
        
            
        
        All following actions will be done under this directory.
    


You can check the version is 1.17.2:
$ istioctl version
no running Istio pods in "istio-system"
1.17.2
Step 3: Install Istio
Create an install configuration for Istio, notice that we assign 4 CPUs and 1 DLB device to ingress gateway and set concurrency as 4, which is equal to the CPU number.
$ cat > config.yaml << EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: default
  components:
    ingressGateways:
    - enabled: true
      name: istio-ingressgateway
      k8s:
        overlays:
          - kind: Deployment
            name: istio-ingressgateway
        podAnnotations:
          proxy.istio.io/config: |
            concurrency: 4
        resources:
          requests:
            cpu: 4000m
            memory: 4096Mi
            dlb.intel.com/pf: '1'
          limits:
            cpu: 4000m
            memory: 4096Mi
            dlb.intel.com/pf: '1'
        hpaSpec:
          maxReplicas: 1
          minReplicas: 1
  values:
    telemetry:
      enabled: false
EOF
Use istioctl to install:
$ istioctl install -f config.yaml --set values.gateways.istio-ingressgateway.runAsRoot=true -y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete                                                                                                                                                                                                                                                                       Making this installation the default for injection and validation.

Thank you for installing Istio 1.17.  Please take a few minutes to tell us about your install/upgrade experience!  https://forms.gle/hMHGiwZHPU7UQRWe9
Step 4: Setup Backend Service
Since we want to use DLB connection load balancing in Istio ingress gateway, we need to create a backend service first.
We’ll use an Istio-provided sample to test, httpbin.
$ kubectl apply -f samples/httpbin/httpbin.yaml
$ kubectl apply -f - <

You have now created a virtual service configuration for the httpbin service containing two route rules that allow traffic for paths /status and /delay.
The gateways list specifies that only requests through your httpbin-gateway are allowed. All other external requests will be rejected with a 404 response.
Step 5: Enable DLB Connection Load Balancing
$ kubectl apply -f - <

It is expected that if you check the log of ingress gateway pod istio-ingressgateway-xxxx you will see log entries similar to:
$ export POD="$(kubectl get pods -n istio-system | grep gateway | awk '{print $1}')"
$ kubectl logs -n istio-system ${POD} | grep dlb
2023-05-05T06:16:36.921299Z     warning envoy config external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:46        dlb device 0 is not found, use dlb device 3 instead     thread=35
Envoy will auto detect and choose the DLB device.
Step 6: Test
$ export HOST=""
$ export PORT="$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')"
$ curl -s -I -HHost:httpbin.example.com "http://${HOST}:${PORT}/status/200"
HTTP/1.1 200 OK
server: istio-envoy
...
Note that you use the -H flag to set the Host HTTP header to httpbin.example.com since now you have no DNS binding for that host and are simply sending your request to the ingress IP.
You can also add the DNS binding in /etc/hosts and remove -H flag:
$ echo "$HOST httpbin.example.com" >> /etc/hosts
$ curl -s -I "http://httpbin.example.com:${PORT}/status/200"
HTTP/1.1 200 OK
server: istio-envoy
...
Access any other URL that has not been explicitly exposed. You should see an HTTP 404 error:
$ curl -s -I -HHost:httpbin.example.com "http://${HOST}:${PORT}/headers"
HTTP/1.1 404 Not Found
...
You can turn on debug log level to see more DLB related logs:
$ istioctl pc log ${POD}.istio-system --level debug
istio-ingressgateway-665fdfbf95-2j8px.istio-system:
active loggers:
  admin: debug
  alternate_protocols_cache: debug
  aws: debug
  assert: debug
  backtrace: debug
...
Run curl to send one request and you will see something like below:
$ kubectl logs -n istio-system ${POD} | grep dlb
2023-05-05T06:16:36.921299Z     warning envoy config external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:46        dlb device 0 is not found, use dlb device 3 instead     thread=35
2023-05-05T06:37:45.974241Z     debug   envoy connection external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:269   worker_3 dlb send fd 45 thread=47
2023-05-05T06:37:45.974427Z     debug   envoy connection external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:286   worker_0 get dlb event 1        thread=46
2023-05-05T06:37:45.974453Z     debug   envoy connection external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:303   worker_0 dlb recv 45    thread=46
2023-05-05T06:37:45.975215Z     debug   envoy connection external/envoy/contrib/network/connection_balance/dlb/source/connection_balancer_impl.cc:283   worker_0 dlb receive none, skip thread=46
For more details about Istio Ingress Gateway, please refer to Istio Ingress Gateway Official Doc.



Announcing Istio's graduation within the CNCF
Wed, 12 Jul 2023 00:00:00 +0000
We are delighted to announce that Istio is now a graduated Cloud Native Computing Foundation (CNCF) project.
We would like to thank our TOC sponsors Emily Fox and Nikhita Raghunath, and everyone who has collaborated over the past six years on Istio’s design, development, and deployment.
As before, project work continues uninterrupted. We were excited to bring ambient mesh to Alpha in Istio 1.18 and are continuing to drive it to production readiness. Sidecar deployments remain the recommended method of using Istio, and our 1.19 release will support a new sidecar container feature in Alpha in Kubernetes 1.28.
We have been delighted to welcome Microsoft to our community after their decision to archive the Open Service Mesh project and collaborate together on Istio. As the third most active CNCF project in terms of PRs, and with support from over 20 vendors and dozens of contributing companies, there is simply no better choice for a service mesh.
We would like to invite the Istio community to submit a talk to the upcoming virtual IstioCon 2023, the companion full day, in-person event co-located with KubeCon China in Shanghai, or Istio Day co-located with KubeCon NA in Chicago.
Watch a video
In this video for Techstrong TV, I talk about the history of the project, and what graduation means to us.

Words of support from our alumni
When we announced our incubation, we mentioned that the journey began with Istio’s inception in 2016. One of the great things about collaborative open source projects is that people come and go from employers, but their affiliation with a project can remain. Some of our original contributors founded companies based on Istio; some moved to other companies that support it; and some are still working on it at Google or IBM, six years later.
The announcement from the CNCF and blog posts from Intel, Red Hat, Solo.io, Tetrate, VMware and DaoCloud summarize the thoughts and feelings of those working on the project today.
We also reached out to some contributors who have moved on from the project, to share their thoughts.

    
        
            
        
        From the very beginning of Istio, we wished for it to join its big brother Kubernetes as a core part of the CNCF landscape. Seeing all that the Istio project has accomplished since those early days is an amazing gift. I couldn’t be prouder of what the community has accomplished and what this graduation means to the continued success of the project.

        
            Sven Mawson, Istio co-founder and Chief Software Architect, SambaNova Systems
        
    



    
        
            
        
        As a co-founder of the Istio service mesh, it is very gratifying to see how far we have come. We started off with a vision for an infrastructure that provided security, observability and programmability out of the box to cloud native and legacy applications. We were humbled by the dramatic adoption across enterprises and grateful for the trust people placed in the Istio team when they deployed critical production workloads on Istio. Graduating from CNCF is a great formal validation and recognition of our vision, our project and the huge community we have built so far.

        
            Shriram Rajagopalan, co-creator of Amalgam8
        
    



    
        
            
        
        When we launched Istio six years ago, we knew it would make waves, but we didn’t realize that we had opened the floodgate. It grew beyond any of our wildest imagination, and today Istio marks another milestone. As a founding member and as someone who got to play almost every role on this product over the years, I’m infinitely grateful to have been part of Istio’s incredible journey.

        
            Jasmine Jaksic, original Istio TPM
        
    



    
        
            
        
        When we started Istio, before the concept of a service mesh existed, we had a broad idea of what it would be, but the details were murky. It was exciting to see the tech quickly evolve and grow into an invaluable asset for the community. It’s gratifying that all this hard work has led us to this point.

        
            Martin Taillefer, original Istio engineer
        
    



    
        
            
        
        When we were building the initial prototypes for what would become Istio, we had hopes that others would see the value in what we were creating, and that it would make a positive impact on the way in which organizations built, managed, and monitored their production services. Graduation from CNCF marks a realization, beyond any reasonable measure, of those initial aspirations. Of course, such a milestone is only achievable with contributions from a large community of passionate, knowledgeable, and dedicated individuals. This achievement is a celebration of the kindness, patience, and expertise they have shared over the years. May the project continue to grow and help its users deliver secure, monitored services for many years to come!

        
            Douglas Reid, original Istio engineer and Founding Engineer, Steamship
        
    



    
        
            
        
        During my time as a contributor and leader within the Istio community, Istio repeatedly showed itself to be a powerful platform with the tools organizations need at the center of their security, networking, and observability strategies. I’m especially proud of the optimizations we made in the Product Security and Test and Release work groups to prioritize users’ needs through secure, reliable, and predictable features and releases. Istio’s graduation in CNCF is a huge step forward for the community, validating all of the hard work we’ve contributed. Congratulations to the community. I’m excited to see where Istio goes next.

        
            Brian Avery, former TOC member, Istio Product Security Lead, and Test and Release Lead
        
    





Istio Day North America 2023, Twice The Fun!
Fri, 16 Jun 2023 00:00:00 +0000

    
        
            
        
    
    

We all had a blast at Istio Day Europe in April. The event was incredibly well received, but organizers and attendees alike felt that a half-day was not enough to showcase all that Istio has to offer. Due to the overwhelming response, we are glad to share with all of you that Istio Day North America is going to be a full-day event, co-located with KubeCon North America in Chicago.

    
        
            
        
    
    

Submit a talk
We now encourage Istio users, developers, partners, and advocates to submit a session proposal through the CNCF event portal, which is open until August 6th.
We want to see real world examples, case studies, and success stories that can inspire newcomers to use Istio in production. The content will cover introductory to advanced levels, split into four main topic tracks:

New Features: What have you been working on that the community should know about?
Case Studies: How have you built a platform or service on top of Istio?
Istio Recipes: How you can solve a specific business problem using Istio.
Project Updates: The evolution of Istio, and the latest updates from the project maintainers.

You can pick one of these formats to submit a session proposal:

Presentation: 25 minutes, 1 or 2 speaker(s) presenting a topic
Panel Discussion: 35 minutes of discussion among 3 to 5 speakers
Lightning Talk: A brief 5-minute presentation, maximum of 1 speaker

Accepted speakers will receive a complimentary All-Access In-Person ticket for all four days of KubeCon + CloudNativeCon.
Timeline

Thursday, June 15: CFP + Sponsor Prospectus Launch
Sunday, August 6: CFP Closes
Tuesday, August 8 - Monday, August 21: CFP Review Window
Thursday, September 7: Speaker Notifications
Week of September 11: Schedule Launch & Announcement
Wednesday, September 20: Sponsor Sales Close
Monday, November 6: Event Day

Sponsor the event
Do you want to put your product or service in front of the most discerning Cloud Native users: those who demand 25% more conference than the crowd? Check out page 19 of the CNCF events prospectus to learn more. Contact sponsor@cncf.io to secure your sponsorship today! Signed contracts must be received by September 20.
Register to attend
Istio Day is a KubeCon + CloudNativeCon North America CNCF-hosted Co-located Event. In-person KubeCon + CloudNativeCon attendees have the option to buy an All-Access ticket which includes entry to all the CNCF-hosted “day 0” events, as well as the main three days of the conference. You must be attending KubeCon to attend Istio Day, but virtual registration options are available, and the recordings will be posted to YouTube soon after the event.
For those of you who can’t make it, keep your eyes peeled for announcements of IstioCon 2023 (Virtual) and Istio Day China.
Stay tuned to hear more about the event, and we hope you can join us in Chicago for Istio Day!



Istio at KubeCon Europe 2023
Thu, 27 Apr 2023 00:00:00 +0000
The open source and cloud native community gathered from 18th to 21st April in Amsterdam for the first KubeCon of 2023. The four-day conference, organized by the Cloud Native Computing Foundation, was special for Istio, as we evolved from a participant at ServiceMeshCon to hosting our first official project co-located event.

    
        
            
        
    
    Istio Day Europe 2023, Welcome

Istio Day kicked off with an opening keynote from the Program Committee chairs, Mitch Connors and Faseela K. The event was packed with great content, ranging from new features to end user talks, and the hall was always jam-packed. The opening keynote was an ice-breaker with some Istio fun in the form of a pop quiz, and recognition for the day-to-day efforts of our contributors, maintainers, release managers, and users.

    
        
            
        
    
    Istio Day Europe 2023, Opening Keynote

This was followed by a 2023 roadmap update session from TOC members Lin Sun and Louis Ryan. We had our much awaited session on the security posture of Ambient Mesh, from Christian Posta and John Howard, which stirred some interesting discussions in the community. After this we stepped into our first end user talk from John Keates from Wehkamp, a local Dutch company, followed by speakers from Bloomberg, Alexa Griffith and Zhenni Fu, on how they secure their highly privileged financial information using Istio. Istio Day witnessed more focus on security, which became even more prominent when Zack Butcher talked about using Istio for Controls Compliance. We also had lightning talks covering faster Istio development environments, guide for Istio resource isolation and securing hybrid cloud deployments from Mitch Connors, Zhonghu Xu and Matt Turner respectively.

    
        
            
        
    
    Istio Day Europe 2023, Jam packed sessions

A number of our ecosystem members had Istio-related announcements at the event. Microsoft announced Istio as a managed add-on for Azure Kubernetes Service, and support for Istio is now generally available in D2iQ Kubernetes Platform.
Tetrate announced Tetrate Service Express, an Istio-based service connectivity, security and resilience automation solution for Amazon EKS, and Solo.io announced Gloo Fabric, with Istio-based application networking capabilities expanded to VM-based, container, and serverless applications across cloud environments.
Istio’s presence at the conference did not end with Istio Day. The second day keynote started with a project update video from Lin Sun. It was also a proud moment for us, when our steering committee member Craig Box was recognized as a CNCF mentor in the keynote. The maintainer track for Istio presented by TOC member Neeraj Poddar grabbed great attention as he talked about the current ongoing efforts and future roadmap of Istio. The talk, and the size of the audience, underlined why Istio continues to be the most popular service mesh in the industry.

    
        
            
        
    
    KubeCon Europe 2023, Question: How many of you use Istio in production?

The following sessions at KubeCon were based on Istio and almost all of them had a huge crowd in attendance:

Future of Istio - Sidecar, Sidecarless or Both?
Operate Multi Tenancy Istio with ArgoCD in production
Create Istio Filters with Any Programming Language
Automated Cloud-Native Incident Response with Kubernetes and Service Mesh
Autoscaling Elastic Kubernetes Infrastructure for Stateful Applications Using Proxyless gRPC and Istio
Developing a Mental Model of Istio: From Kubernetes to Sidecars to Ambient
Future of ServiceMesh - Sidecar, Sidecarless or Proxyless? - Panel Discussion
The Top 10 List of Istio Security Risks and Mitigation Strategies

Istio had a full time kiosk in the KubeCon project pavilion, with the majority of questions asked being on the status of our CNCF graduation. We are so excited to know that our users are eagerly waiting for news of our graduation, and we promise we are actively working towards it!

    
        
            
        
    
    KubeCon Europe 2023, Istio Kiosk

Many of our TOC members and maintainers also offered support at the booth, where a lot of interesting discussions happened around Istio Ambient Mesh as well.

    
        
            
        
    
    KubeCon Europe, More support at Istio Kiosk

Another highlight was Istio TOC and steering members and authors Lin Sun and Christian Posta signing copies of the “Istio Ambient Explained” book.

    
        
            
        
    
    KubeCon Europe 2023, Ambient Mesh book signing by authors

Last, but not least, we would like to express our heartfelt gratitude to our platinum sponsors Tetrate, for supporting Istio Day!
2023 is going to be really big for Istio, with more events planned for the coming months! Stay tuned for updates on IstioCon 2023 and Istio’s presence at KubeCon in China and North America.



Comprehensive Network Security at Splunk
Mon, 03 Apr 2023 00:00:00 +0000
With dozens of tools for securing your network available, it is easy to find tutorials and demonstrations illustrating how these individual tools make your network more secure by adding identity, policy, and observability to your traffic. What is often less clear is how these tools interoperate to provide comprehensive security for your network in production. How many tools do you need? When is your network secure enough?
This post will explore the tools and practices leveraged by Splunk to secure their Kubernetes network infrastructure, starting with VPC design and connectivity and going all the way up the stack to HTTP Request based security. Along the way, we’ll see what it takes to provide comprehensive network security for your cloud native stack, how these tools interoperate, and where some of them can improve. Splunk uses a variety of tools to secure their network, including:

AWS Functionality
Kubernetes
Istio
Envoy
Aviatrix

About Splunk’s Use Case
Splunk is a technology company that provides a platform for collecting, analyzing and visualizing data generated by various sources. It is primarily used for searching, monitoring, and analyzing machine-generated big data through a web-style interface. Splunk Cloud is an initiative to move Splunk’s internal infrastructure to a cloud native architecture. Today Splunk Cloud consists of over 35 fully replicated clusters in AWS and GCP in regions around the world.
Securing Layer 3/4: AWS, Aviatrix and Kubernetes
At Splunk Cloud, we use a pattern called “cookie cutter VPCs” where each cluster is provisioned with it’s own VPC, with identical private subnets for Pod and Node IPs, a public subnet for ingress and egress to and from the public internet, and an internal subnet for traffic between clusters. This keeps Pods and Nodes from separate clusters completely isolated, while allowing traffic outside the cluster to have particular rules enforced in the public and internal subnets. Additionally, this pattern avoids the possibility of RFC 1918 private IP exhaustion when leveraging many clusters.
Within each VPC, Network ACLs and Security Groups are set up to restrict connectivity to what is absolutely required. As an example, we restrict public connectivity to our Ingress nodes (that will deploy Envoy ingress gateways). In addition to ordinary east/west and north/south traffic, there are also shared services at Splunk that every cluster needs to access. Aviatrix is used to provide overlapping VPC access, while also enforcing some high level security rules (segmentation per domain).

    
        
            
        
    
    Splunk Network Security Architecture

The next security layer in Splunk’s stack is Kubernetes itself. Validating Webhooks are used to prevent the deployment of K8S objects that would allow insecure traffic in the cluster (typically around NLBs and services). Splunk also relies on NetworkPolicies for securing and restricting Pod to Pod connectivity.
Securing Layer 7: Istio
Splunk uses Istio to enforce policy on the application layer based on the details of each request. Istio also emits Telemetry data (metrics, logs, traces) that is useful for validating request-level security.
One of the key benefits of Istio’s injection of Envoy sidecars is that Istio can provide in-transit encryption for the entire mesh without requiring any modifications to the applications. The applications send plain text HTTP requests, but the Envoy sidecar intercepts the traffic and implements Mutual TLS encryption to protect against interception or modification.
Istio manages Splunk’s ingress gateways, which receive traffic from public and internal NLBs. The gateways are managed by the platform team and run in the Istio Gateway namespace, allowing users to plug into them, but not modify them. The Gateway service is also provisioned with certificates to enforce TLS by default, and Validating Webhooks ensure that services can only connect to gateways for their own hostnames. Additionally, gateways enforce request authentication at ingress, before traffic is able to impact application pods.
Because Istio and related K8S objects are relatively complex to configure, Splunk created an abstraction layer, which is a controller that configures everything for the service, including virtual services, destination rules, gateways, certificates, and more. It sets up DNS that goes directly to the right NLB. It’s a one-click solution for end-to-end network deployment. For more complex use cases, the services teams can still bypass the abstraction and configure these settings directly.

    
        
            
        
    
    Splunk Application Platform

Pain Points
While Splunk’s architecture meets many of our needs, there are a few pain points worth discussing. Istio operates by creating as many Envoy Sidecars as application pods, which is an inefficient use of resources. In addition, when a particular application has unique needs from its sidecar, such as additional CPU or Memory, it can be difficult to adjust these settings without adjusting them for all sidecars in the mesh. Istio Sidecar injection involves a lot of magic, using a mutating webhook to add a sidecar container to every pod as it is created, which means those pods no longer match their corresponding deployments. Additionally, injection can only happen at pod creation time, which means that any time a sidecar version or parameter is updated, all pods must be restarted before they will get the new settings. Overall, this magic complicates running a service mesh in production, and adds a great deal of operational uncertainty to your application.
The Istio project is aware of these limitations, and believes they will be substantially improved by the new Ambient mode for Istio. In this mode, Layer 4 constructs like identity and encryption will be applied by a Daemon running on the node, but not in the same pod as the application. Layer 7 features will still be handled by Envoy, but Envoy will be run in an adjacent pod as part of its own deployment, rather than relying on the magic of sidecar injection. Application pods will not be modified in any way in ambient mode, which should add a good deal of predictability to service mesh operations. Ambient mode is expected to reach Alpha quality in Istio 1.18.
Conclusion
With all these layers to network security at Splunk Cloud, it is helpful to take a step back and examine the life of a request as it traverses these layers. When a client sends a request, they first connect to the NLB, which will be allowed or blocked by the VPC ACLs. The NLB then proxies the request to one of the ingress nodes, which terminates TLS and inspects the request at Layer 7, choosing to allow or block the request. The Envoy Gateway then validates the request using ExtAuthZ to ensure it is properly authenticated, and meets quota restrictions before being allowed into the cluster. Next, the Envoy Gateway proxies the request upstream, and the network policies from Kubernetes take effect again to make sure this proxying is allowed. The upstream sidecar on the workload inspects the Layer 7 requests and if allowed, it will decrypt the request and send it to the workload in clear text.

    
        
            
        
    
    Cloud Native Network Security Matrix

Securing Splunk’s Cloud Native Network Stack while meeting the scalability needs of this large enterprise company requires careful security planning at each layer.
While applying identity, observability, and policy principles at every layer in the stack may appear redundant at first glance, each layer is able to make up for the shortcomings of the others, so that together these layers form a tight and effective barrier to unwanted access.
If you are interested in diving deeper into Splunk’s Network Security Stack, you can watch our Cloud Native SecurityCon presentation.



Istio Ambient Waypoint Proxy Made Simple
Fri, 31 Mar 2023 00:00:00 +0000
Ambient splits Istio’s functionality into two distinct layers, a secure overlay layer and a
Layer 7 processing layer. The waypoint proxy is an optional component that is Envoy-based
and handles L7 processing for workloads it manages. Since the initial ambient launch in 2022,
we have made significant changes to simplify waypoint configuration, debuggability and scalability.
Architecture of waypoint proxies
Similar to sidecar, the waypoint proxy is also Envoy-based and is dynamically configured by Istio
to serve your applications configuration. What is unique about the waypoint proxy is that it runs either
per-namespace (default) or per-service account. By running outside of the application pod, a waypoint proxy
can install, upgrade, and scale independently from the application, as well as reduce operational costs.

    
        
            
        
    
    Waypoint architecture

Waypoint proxies are deployed declaratively using Kubernetes Gateway resources or the helpful istioctl command:
$ istioctl experimental waypoint generate
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: namespace
spec:
  gatewayClassName: istio-waypoint
  listeners:
  - name: mesh
    port: 15008
    protocol: HBONE
Istiod will monitor these resources and deploy and manage the corresponding waypoint deployment for users automatically.
Shift source proxy configuration to destination proxy
In the existing sidecar architecture, most traffic-shaping (for example request routing or traffic shifting or fault injection) policies are implemented by the source (client) proxy while most security policies are implemented by the destination (server) proxy. This leads to a number of concerns:

Scaling - each source sidecar needs to know information about every other destination in the mesh. This is a polynomial scaling problem. Worse, if any destination configuration changes, we need to notify all sidecars at once.
Debugging - because policy enforcement is split between the client and server sidecars, it can be hard to understand the behavior of the system when troubleshooting.
Mixed environments - if we have systems where not all clients are part of the mesh, we get inconsistent behavior. For example, a non-mesh client wouldn’t respect a canary rollout policy, leading to unexpected traffic distribution.
Ownership and attribution - ideally a policy written in one namespace should only affect work done by proxies running in the same namespace. However, in this model, it is distributed and enforced by each sidecar. While Istio has designed around this constraint to make this secure, it is still not optimal.

In ambient, all policies are enforced by the destination waypoint. In many ways, the waypoint acts as a gateway into the namespace (default scope) or service account. Istio enforces that all traffic coming into the namespace goes through the waypoint, which then enforces all policies for that namespace. Because of this, each waypoint only needs to know about configuration for its own namespace.
The scalability problem, in particular, is a nuisance for users running in large clusters. If we visualize it, we can see just how big an improvement the new architecture is.
Consider a simple deployment, where we have 2 namespaces, each with 2 (color coded) deployments. The Envoy (XDS) configuration required to program the sidecars is shown as circles:

    
        
            
        
    
    Every sidecar has configuration about all other sidecars

In the sidecar model, we have 4 workloads, each with 4 sets of configuration. If any of those configurations changed, all of them would need to be updated. In total there are 16 configurations distributed.
In the waypoint architecture, however, the configuration is dramatically simplified:

    
        
            
        
    
    Each waypoint only has configuration for its own namespace

Here, we see a very different story. We have only 2 waypoint proxies, as each one is able to serve the entire namespace, and each one only needs configuration for its own namespace. In total we have 25% of the amount of configuration sent, even for a simple example.
If we scale each namespace up to 25 deployments with 10 pods each and each waypoint deployment with 2 pods for high availability, the numbers are even more impressive - the waypoint config distribution requires just 0.8% of the configuration distribution of the sidecar, as the table below illustrates!

  
      
          Config Distribution
          Namespace 1
          Namespace 2
          Total
      
  
  
      
          Sidecars
          25 configurations * 250 sidecars
          25 configurations * 250 sidecars
          12500
      
      
          Waypoints
          25 configurations * 2 waypoints
          25 configurations * 2 waypoints
          100
      
      
          Waypoints / Sidecars
          0.8%
          0.8%
          0.8%
      
  

While we use namespace scoped waypoint proxies to illustrate the simplification above, the simplification is similar
when you apply it to service account waypoint proxies.
This reduced configuration means lower resource usage (CPU, RAM, and network bandwidth) for both the
control plane and data plane. While users today can see similar improvements with careful usage of
exportTo in their Istio networking resources or of the Sidecar API,
in ambient mode this is no longer required, making scaling a breeze.
What if my destination doesn’t have a waypoint proxy?
The design of ambient mode centers around the assumption that most configuration is best implemented by the service producer, rather than the service consumer. However, this isn’t always the case - sometimes we need to configure traffic management for destinations we don’t control. A common example of this would be connecting to an external service with improved resilience to handle occasional connection issues (e.g., to add a timeout for calls to example.com).
This is an area under active development in the community, where we design how traffic can be routed to your egress gateway and how you can configure the egress gateway with your desired policies. Look out for future blog posts in this area!
A deep-dive of waypoint configuration
Assuming you have followed the ambient get started guide up to and including the control traffic section, you have deployed a waypoint proxy for the bookinfo-reviews service account to direct 90% traffic to reviews v1 and 10% traffic to reviews v2.
Use istioctl to retrieve the listeners for the reviews waypoint proxy:
$ istioctl proxy-config listener deploy/bookinfo-reviews-istio-waypoint --waypoint
LISTENER              CHAIN                                                 MATCH                                         DESTINATION
envoy://connect_originate                                                       ALL                                           Cluster: connect_originate
envoy://main_internal inbound-vip|9080||reviews.default.svc.cluster.local-http  ip=10.96.104.108 -> port=9080                 Inline Route: /*
envoy://main_internal direct-tcp                                            ip=10.244.2.14 -> ANY                         Cluster: encap
envoy://main_internal direct-tcp                                            ip=10.244.1.6 -> ANY                          Cluster: encap
envoy://main_internal direct-tcp                                            ip=10.244.2.11 -> ANY                         Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.2.11 -> application-protocol='h2c'  Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.2.11 -> application-protocol='http/1.1' Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.2.14 -> application-protocol='http/1.1' Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.2.14 -> application-protocol='h2c'  Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.1.6 -> application-protocol='h2c'   Cluster: encap
envoy://main_internal direct-http                                           ip=10.244.1.6 -> application-protocol='http/1.1'  Cluster: encap
envoy://connect_terminate default                                               ALL                                           Inline Route:
For requests arriving on port 15008, which by default is Istio’s inbound HBONE port, the waypoint proxy terminates the HBONE connection and forwards the request to the main_internal listener to enforce any workload policies such as AuthorizationPolicy. If you are not familiar with internal listeners, they are Envoy listeners that accepts user space connections without using the system network API. The --waypoint flag added to the istioctl proxy-config command, above, instructs it to show the details of the main_internal listener, its filter chains, chain matches, and destinations.
Note 10.96.104.108 is the reviews’ service VIP and 10.244.x.x are the reviews’ v1/v2/v3 pod IPs, which you can view for your cluster using the kubectl get svc,pod -o wide command. For plain text or HBONE terminated inbound traffic, it will be matched on the service VIP and port 9080 for reviews or by pod IP address and application protocol (either ANY, h2c, or http/1.1).
Checking out the clusters for the reviews waypoint proxy, you get the main_internal cluster along with a few inbound clusters. Other than the clusters for infrastructure, the only Envoy clusters created are for services and pods running in the same service account. No clusters are created for services or pods running elsewhere.
$ istioctl proxy-config clusters deploy/bookinfo-reviews-istio-waypoint
SERVICE FQDN                         PORT SUBSET  DIRECTION   TYPE         DESTINATION RULE
agent                                -    -       -           STATIC
connect_originate                    -    -       -           ORIGINAL_DST
encap                                -    -       -           STATIC
kubernetes.default.svc.cluster.local 443  tcp     inbound-vip EDS
main_internal                        -    -       -           STATIC
prometheus_stats                     -    -       -           STATIC
reviews.default.svc.cluster.local    9080 http    inbound-vip EDS
reviews.default.svc.cluster.local    9080 http/v1 inbound-vip EDS
reviews.default.svc.cluster.local    9080 http/v2 inbound-vip EDS
reviews.default.svc.cluster.local    9080 http/v3 inbound-vip EDS
sds-grpc                             -    -       -           STATIC
xds-grpc                             -    -       -           STATIC
zipkin                               -    -       -           STRICT_DNS
Note that there are no outbound clusters in the list, which you can confirm using istioctl proxy-config cluster deploy/bookinfo-reviews-istio-waypoint --direction outbound! What’s nice is that you didn’t need to configure exportTo on any other bookinfo services (for example, the productpage or ratings services). In other words, the reviews waypoint is not made aware of any unnecessary clusters, without any extra manual configuration from you.
Display the list of routes for the reviews waypoint proxy:
$ istioctl proxy-config routes deploy/bookinfo-reviews-istio-waypoint
NAME                                                    DOMAINS MATCH              VIRTUAL SERVICE
encap                                                   *       /*
inbound-vip|9080|http|reviews.default.svc.cluster.local *       /*                 reviews.default
default
Recall that you didn’t configure any Sidecar resources or exportTo configuration on your Istio networking resources. You did, however, deploy the bookinfo-productpage route to configure an ingress gateway to route to productpage but the reviews waypoint has not been made aware of any such irrelevant routes.
Displaying the detailed information for the inbound-vip|9080|http|reviews.default.svc.cluster.local route, you’ll see the weight-based routing configuration directing 90% of the traffic to reviews v1 and 10% of the traffic to reviews v2, along with some of Istio’s default retry and timeout configurations. This confirms the traffic and resiliency policies are shifted from the source to destination oriented waypoint as discussed earlier.
$ istioctl proxy-config routes deploy/bookinfo-reviews-istio-waypoint --name "inbound-vip|9080|http|reviews.default.svc.cluster.local" -o yaml
- name: inbound-vip|9080|http|reviews.default.svc.cluster.local
 validateClusters: false
 virtualHosts:
 - domains:
   - '*'
   name: inbound|http|9080
   routes:
   - decorator:
       operation: reviews:9080/*
     match:
       prefix: /
     metadata:
       filterMetadata:
         istio:
           config: /apis/networking.istio.io/v1alpha3/namespaces/default/virtual-service/reviews
     route:
       maxGrpcTimeout: 0s
       retryPolicy:
         hostSelectionRetryMaxAttempts: "5"
         numRetries: 2
         retriableStatusCodes:
         - 503
         retryHostPredicate:
         - name: envoy.retry_host_predicates.previous_hosts
           typedConfig:
             '@type': type.googleapis.com/envoy.extensions.retry.host.previous_hosts.v3.PreviousHostsPredicate
         retryOn: connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes
       timeout: 0s
       weightedClusters:
         clusters:
         - name: inbound-vip|9080|http/v1|reviews.default.svc.cluster.local
           weight: 90
         - name: inbound-vip|9080|http/v2|reviews.default.svc.cluster.local
           weight: 10
Check out the endpoints for reviews waypoint proxy:
$ istioctl proxy-config endpoints deploy/bookinfo-reviews-istio-waypoint
ENDPOINT                                            STATUS  OUTLIER CHECK CLUSTER
127.0.0.1:15000                                     HEALTHY OK            prometheus_stats
127.0.0.1:15020                                     HEALTHY OK            agent
envoy://connect_originate/                          HEALTHY OK            encap
envoy://connect_originate/10.244.1.6:9080           HEALTHY OK            inbound-vip|9080|http/v2|reviews.default.svc.cluster.local
envoy://connect_originate/10.244.1.6:9080           HEALTHY OK            inbound-vip|9080|http|reviews.default.svc.cluster.local
envoy://connect_originate/10.244.2.11:9080          HEALTHY OK            inbound-vip|9080|http/v1|reviews.default.svc.cluster.local
envoy://connect_originate/10.244.2.11:9080          HEALTHY OK            inbound-vip|9080|http|reviews.default.svc.cluster.local
envoy://connect_originate/10.244.2.14:9080          HEALTHY OK            inbound-vip|9080|http/v3|reviews.default.svc.cluster.local
envoy://connect_originate/10.244.2.14:9080          HEALTHY OK            inbound-vip|9080|http|reviews.default.svc.cluster.local
envoy://main_internal/                              HEALTHY OK            main_internal
unix://./etc/istio/proxy/XDS                        HEALTHY OK            xds-grpc
unix://./var/run/secrets/workload-spiffe-uds/socket HEALTHY OK            sds-grpc
Note that you don’t get any endpoints related to any services other than reviews, even though you have a few other services in the default and istio-system namespace.
Wrapping up
We are very excited about the waypoint simplification focusing on destination oriented waypoint proxies. This is another significant step towards simplifying Istio’s usability, scalability and debuggability which are top priorities on Istio’s roadmap. Follow our getting started guide to try the ambient alpha build today and experience the simplified waypoint proxy!



Using eBPF for traffic redirection in Istio ambient mode
Wed, 29 Mar 2023 00:00:00 +0000

    
        
            
        
        Ambient mode now uses in-Pod redirection to redirect traffic between workload pods and ztunnel. The method described in this blog is no longer needed, and this post has been left for historical interest.
    


In Istio’s new ambient mode, the istio-cni component running on each Kubernetes worker node is responsible for redirecting application traffic to the zero-trust tunnel (ztunnel) on that node. By default it relies on iptables and
Generic Network Virtualization Encapsulation (Geneve) overlay tunnels to achieve this redirection. We have now added support for an eBPF-based method of traffic redirection.
Why eBPF
Although performance considerations are essential in the implementation of Istio ambient mode redirection, it’s also important to consider ease of programmability, to enable the implementation of versatile and customized requirements. With eBPF, you can leverage additional context in the kernel to bypass complex routing and simply send packets to their final destination.
Furthermore, eBPF enables deeper visibility and additional context for packets in the kernel, allowing for more efficient and flexible management of data flow compared with iptables.
How it works
An eBPF program, attached to the traffic control ingress and egress hook, has been compiled into the Istio CNI component. istio-cni will watch pod events and attach/detach the eBPF program to other related network interfaces when the pod is moved into or out of ambient mode.
Using an eBPF program (instead of iptables) eliminates the need to encapsulate tasks (for Geneve), allowing the routing tasks to be customized in kernel space instead. This yields both gains in performance, and additional flexibility, in routing.

    
        
            
        
    
    ambient eBPF architecture

All traffic to/from the application pod will be intercepted by eBPF and redirected to the corresponding ztunnel pod. On the ztunnel side, proper redirection will be performed based on connection lookup results within the eBPF program. This provides more efficient control of the network traffic between the application and ztunnel.
How to enable eBPF redirection in Istio ambient mode
Follow the instructions in Getting Started with Ambient Mesh to set up your cluster, with a small change: when you install Istio, set the values.cni.ambient.redirectMode configuration parameter to ebpf.
$ istioctl install --set profile=ambient --set values.cni.ambient.redirectMode="ebpf"
Check the istio-cni logs to confirm eBPF redirection is on:
ambient Writing ambient config: {"ztunnelReady":true,"redirectMode":"eBPF"}
Performance gains
The latency and throughput (QPS) for eBPF redirection are somewhat better than using iptables. The following tests were run in a kind cluster with
a Fortio client sending requests to a Fortio server, both running in ambient mode (with eBPF debug logging disabled) and on the same Kubernetes worker node.
$ fortio load -uniform -t 60s -qps 0 -c  http://:8080

    
        
            
        
    
    Max QPS, with varying number of connections

$ fortio load -uniform -t 60s -qps 8000 -c  http://:8080

    
        
            
        
    
    P75 Latency (ms) for QPS 8000 with varying number of connections

Wrapping up
Both eBPF and iptables have their own advantages and disadvantages when it comes to traffic redirection. eBPF is a modern, flexible, and powerful alternative that allows for more customization in rule creation and offers better performance. However, it does require a modern kernel version (4.20 or later for redirection case) which may not be available on some systems. On the other hand, iptables is widely used and compatible with most Linux distributions, even those with older kernels. However, it lacks the flexibility and extensibility of eBPF and may have lower performance.
Ultimately, the choice between eBPF and iptables for traffic redirection will depend on the specific needs and requirements of the system, as well as the user’s level of expertise in using each tool. Some users may prefer the simplicity and compatibility of iptables, while others may require the flexibility and performance of eBPF.
There is still plenty of work to be done, including integration with various CNI plugins, and contributions to improve the ease of use would be greatly welcomed. Please join us in #ambient on the Istio slack.



Support for Dual Stack Kubernetes Clusters
Fri, 10 Mar 2023 00:00:00 +0000
Over the past year, both Intel and F5 have collaborated on an effort to bring support for
Kubernetes Dual-Stack networking to Istio.
Background
The journey has taken us longer than anticipated and we continue to have work to do. The team initially started with a design based
on a reference implementation from F5. The design led to an RFC that caused us to re-examine our approach. Notably, there were concerns about memory and performance issues that the community wanted
to be addressed before implementation. The original design had to duplicate Envoy configuration for listeners, clusters, routes and endpoints. Given that many people already experience Envoy memory and CPU consumption issues, early feedback wanted us to completely re-evaluate this approach. Many proxies transparently handle outbound dual-stack traffic regardless of how the traffic was originated. Much of the earliest feedback was to implement the same behavior in Istio and Envoy.
Redefining Dual Stack Support
Much of the feedback provided by the community for the original RFC was to update Envoy to better support dual-stack use cases
internally instead of supporting this within Istio. This has led us to a new design where we have taken lessons learned as well as feedback and have applied them to fit a simplified design.
Support for Dual Stack in Istio 1.17
We have worked with the Envoy community to address numerous concerns which is a reason why dual-stack enablement has
taken us a while to implement. We have implemented matched IP Family for outbound listener
and supported multiple addresses per listener. Alex Xu has also
been working fervently to get long outstanding issues resolved, with the ability for Envoy to have a
smarter way to pick endpoints for dual-stack. Some of these improvements
to Envoy, such as the ability to enable socket options on multiple addresses,
have landed in the Istio 1.17 release (e.g. extra source addresses on inbound clusters).
The Envoy API changes made by the team can be found at their site at Listener addresses and bind config. Making sure we can have proper support at both the downstream and upstream connection for Envoy is important for realizing
dual-stack support.
In total the team has submitted over a dozen PRs to Envoy and are working on at least a half dozen more to make Envoy adoption of
dual stack easier for Istio.
Meanwhile, on the Istio side you can track the progress in Issue #40394.
Progress has slowed down a bit lately as we continue working with Envoy on various issues, however, we are happy to
announce experimental support for dual stack in Istio 1.17!
A Quick Experiment using Dual Stack

    
        
            
        
        If you want to use KinD for your test, you can set up a dual stack cluster with the following command:
$ kind create cluster --name istio-ds --config - <

    





Enable dual stack experimental support on Istio 1.17.0+ with the following:
$ istioctl install -y -f - <



Create three namespaces:

dual-stack: tcp-echo will listen on both an IPv4 and IPv6 address.
ipv4: tcp-echo will listen on only an IPv4 address.
ipv6: tcp-echo will listen on only an IPv6 address.

$ kubectl create namespace dual-stack
$ kubectl create namespace ipv4
$ kubectl create namespace ipv6


Enable sidecar injection on all of those namespaces as well as the default namespace:
$ kubectl label --overwrite namespace default istio-injection=enabled
$ kubectl label --overwrite namespace dual-stack istio-injection=enabled
$ kubectl label --overwrite namespace ipv4 istio-injection=enabled
$ kubectl label --overwrite namespace ipv6 istio-injection=enabled


Create tcp-echo deployments in the namespaces:
$ kubectl apply --namespace dual-stack -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/tcp-echo/tcp-echo-dual-stack.yaml
$ kubectl apply --namespace ipv4 -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/tcp-echo/tcp-echo-ipv4.yaml
$ kubectl apply --namespace ipv6 -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/tcp-echo/tcp-echo-ipv6.yaml


Create sleep deployment in the default namespace:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/sleep/sleep.yaml


Verify the traffic:
$ kubectl exec -it "$(kubectl get pod -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- sh -c "echo dualstack | nc tcp-echo.dual-stack 9000"
hello dualstack
$ kubectl exec -it "$(kubectl get pod -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- sh -c "echo ipv4 | nc tcp-echo.ipv4 9000"
hello ipv4
$ kubectl exec -it "$(kubectl get pod -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- sh -c "echo ipv6 | nc tcp-echo.ipv6 9000"
hello ipv6


Now you can experiment with dual-stack services in your environment!
Important Changes to Listeners and Endpoints
For the above experiment, you’ll notice changes are made to listeners and routes:
$ istioctl proxy-config listeners "$(kubectl get pod -n dual-stack -l app=tcp-echo -o jsonpath='{.items[0].metadata.name}')" -n dual-stack --port 9000
You will see listeners are now bound to multiple addresses, but only for dual stack services. Other services will only
be listening on a single IP address.
"name": "fd00:10:96::f9fc_9000",
"address": {
    "socketAddress": {
        "address": "fd00:10:96::f9fc",
        "portValue": 9000
    }
},
"additionalAddresses": [
    {
        "address": {
            "socketAddress": {
                "address": "10.96.106.11",
                "portValue": 9000
            }
        }
    }
],
Virtual inbound addresses are now also configured to listen on both 0.0.0.0 and [::].
"name": "virtualInbound",
"address": {
    "socketAddress": {
        "address": "0.0.0.0",
        "portValue": 15006
    }
},
"additionalAddresses": [
    {
        "address": {
            "socketAddress": {
                "address": "::",
                "portValue": 15006
            }
        }
    }
],
Envoy’s endpoints now are configured to route to both IPv4 and IPv6:
$ istioctl proxy-config endpoints "$(kubectl get pod -l app=sleep -o jsonpath='{.items[0].metadata.name}')" --port 9000
ENDPOINT                 STATUS      OUTLIER CHECK     CLUSTER
10.244.0.19:9000         HEALTHY     OK                outbound|9000||tcp-echo.ipv4.svc.cluster.local
10.244.0.26:9000         HEALTHY     OK                outbound|9000||tcp-echo.dual-stack.svc.cluster.local
fd00:10:244::1a:9000     HEALTHY     OK                outbound|9000||tcp-echo.dual-stack.svc.cluster.local
fd00:10:244::18:9000     HEALTHY     OK                outbound|9000||tcp-echo.ipv6.svc.cluster.local
Get Involved
Plenty of work remains, and you are welcome to help us with the remaining tasks needed for dual stack support to get to
Alpha here.
For instance, Iris Ding (Intel) and Li Chun (Intel) are already working with the community for getting redirection of
network traffic for ambient, and we are hoping to have ambient support dual stack for its upcoming alpha release in
Istio 1.18.
We would love your feedback and if you are eager to work with us please stop by our slack channel, #dual-stack within
the Istio Slack.
Thank you to the team that has worked on Istio dual-stack!

Intel: Steve Zhang, Alex Xu, Iris Ding
F5: Jacob Delgado
Yingchun Cai (formerly of F5)




Istio Ambient Service Mesh Merged to Istio’s Main Branch
Tue, 28 Feb 2023 00:00:00 +0000
Istio ambient service mesh was launched in Sept 2022 in an experimental branch, introducing a new data plane mode for Istio without sidecars. Through collaboration with the Istio community, across Google, Solo.io, Microsoft, Intel, Aviatrix, Huawei, IBM and others, we are excited to announce that Istio ambient mesh has graduated from the experimental branch and merged to Istio’s main branch! This is a significant milestone for ambient mesh, paving the way for releasing ambient in Istio 1.18 and installing it by default in Istio’s future releases.
Major Changes from the Initial Launch
Ambient mesh is designed for simplified operations, broader application compatibility, and reduced infrastructure cost. The ultimate goal of ambient is to be transparent to your applications and we have made a few changes to make the ztunnel and waypoint components simpler and lightweight.

The ztunnel component has been rewritten from the ground up to be fast, secure, and lightweight. Refer to Introducing Rust-Based Ztunnel for Istio Ambient Service Mesh for more information.
We made significant changes to simplify waypoint proxy’s configuration to improve its debuggability and performance. Refer to Istio Ambient Waypoint Proxy Made Simple for more information.
Added the istioctl x waypoint command to help you conveniently deploy waypoint proxies, along with istioctl pc workload to help you view workload information.
We gave users the ability to explicitly bind Istio policies such as AuthorizationPolicy to waypoint proxies vs selecting the destination workload.

Get involved
Follow our getting started guide to try the ambient pre-alpha build today. We’d love to hear from you! To learn more about ambient:

Join us in the #ambient and #ambient-dev channel in Istio’s slack.
Attend the weekly ambient contributor meeting on Wednesdays.
Check out the Istio and ztunnel repositories, submit issues or PRs!




Introducing Rust-Based Ztunnel for Istio Ambient Service Mesh
Tue, 28 Feb 2023 00:00:00 +0000
The ztunnel (zero trust tunnel) component is a purpose-built per-node proxy for Istio ambient mesh. It is responsible for securely connecting and authenticating workloads within ambient mesh. Ztunnel is designed to focus on a small set of features for your workloads in ambient mesh such as mTLS, authentication, L4 authorization and telemetry, without terminating workload HTTP traffic or parsing workload HTTP headers. The ztunnel ensures traffic is efficiently and securely transported to the waypoint proxies, where the full suite of Istio’s functionality, such as HTTP telemetry and load balancing, is implemented.
Because ztunnel is designed to run on all of your Kubernetes worker nodes, it is critical to keep its resource footprint small. Ztunnel is designed to be an invisible (or “ambient”) part of your service mesh with minimal impact on your workloads.
Ztunnel architecture
Similar to sidecars, ztunnel also serves as an xDS client and CA client:

During startup, it securely connects to the Istiod control plane using its
service account token. Once the connection from ztunnel to Istiod is established
securely using TLS, it starts to fetch xDS configuration as an xDS client. This
works similarly to sidecars or gateways or waypoint proxies, except that Istiod
recognizes the request from ztunnel and sends the purpose-built xDS configuration
for ztunnel, which you will learn more about soon.
It also serves as a CA client to manage and provision mTLS certificates on behalf of all co-located workloads it manages.
As traffic comes in or goes out, it serves as a core proxy that handles the inbound and outbound traffic (either out-of-mesh plain text or in-mesh HBONE) for all co-located workloads it manages.
It provides L4 telemetry (metrics and logs) along with an admin server with debugging information to help you debug ztunnel if needed.


    
        
            
        
    
    Ztunnel architecture

Why not reuse Envoy?
When Istio ambient service mesh was announced on Sept 7, 2022, the ztunnel was implemented using an Envoy proxy. Given that we use Envoy for the rest of Istio - sidecars, gateways, and waypoint proxies - it was natural for us to start implementing ztunnel using Envoy.
However, we found that while Envoy was a great fit for other use cases, it was challenging to implement ztunnel in Envoy, as many of the tradeoffs, requirements, and use cases are dramatically different than that of a sidecar proxy or ingress gateway. In addition, most of the things that make Envoy such a great fit for those other use cases, such as its rich L7 feature set and extensibility, went to waste in ztunnel which didn’t need those features.
A purpose-built ztunnel
After having trouble bending Envoy to our needs, we started investigating making a purpose-built implementation of the ztunnel. Our hypothesis was that by designing with a single focused use case in mind from the beginning, we could develop a solution that was simpler and more performant than molding a general purpose project to our bespoke use cases. The explicit decision to make ztunnel simple was key to this hypothesis; similar logic wouldn’t hold up to rewriting the gateway, for example, which has a huge list of supported features and integrations.
This purpose-built ztunnel involved two key areas:

The configuration protocol between ztunnel and its Istiod
The runtime implementation of ztunnel

Configuration protocol
Envoy proxies use the xDS Protocol for configuration. This is a key part of what makes Istio work well, offering rich and dynamic configuration updates. However, as we tread off the beaten path, the config becomes more and more bespoke, which means it’s much larger and more expensive to generate. In a sidecar, a single Service with 1 pod, generates roughly ~350 lines of xDS (in YAML), which already has been challenging to scale. The Envoy-based ztunnel was far worse, and in some areas had N^2 scaling attributes.
To keep the ztunnel configuration as small as possible, we investigated using a purpose built configuration protocol, that contains precisely the information we need (and nothing more), in an efficient format. For example, a single pod could be represented concisely:
name: helloworld-v1-55446d46d8-ntdbk
namespace: default
serviceAccount: helloworld
node: ambient-worker2
protocol: TCP
status: Healthy
waypointAddresses: []
workloadIp: 10.244.2.8
canonicalName: helloworld
canonicalRevision: v1
workloadName: helloworld-v1
workloadType: deployment
This information is transported over the xDS transport API, but uses a custom ambient-specific type. Refer to the workload xDS configuration section to learn more about the configuration details.
By having a purpose built API, we can push logic into the proxy instead of in Envoy configuration. For example, to configure mTLS in Envoy, we need to add an identical large set of configuration tuning the precise TLS settings for each service; with ztunnel, we need only a single enum to declare whether mTLS should be used or not. The rest of the complex logic is embedded directly into ztunnel code.
With this efficient API between Istiod and ztunnel, we found we could configure ztunnels with information about large meshes (such as those with 100,000 pods) with orders of magnitude less configuration, which means less CPU, memory, and network costs.
Runtime implementation
As the name suggests, ztunnel uses an HTTPS tunnel to carry users requests. While Envoy supports this tunneling, we found the configuration model limiting for our needs. Roughly speaking, Envoy operates by sending requests through a series of “filters”, starting with accepting a request and ending with sending a request. With our requirements, which have multiple layers of requests (the tunnel itself and the users’ requests), as well as a need to apply per-pod policy after load balancing, we found we would need to loop through these filters 4 times per connection when implementing our prior Envoy-based ztunnel. While Envoy has some optimizations for essentially “sending a request to itself” in memory, this was still very complex and expensive.
By building out our own implementation, we could design around these constraints from the ground up. In addition, we have more flexibility in all aspects of the design. For example, we could choose to share connections across threads or implement more bespoke requirements around isolation between service accounts. After establishing that a purpose built proxy was viable, we set out to choose the implementation details.
A Rust-based ztunnel
With the goal to make ztunnel fast, secure, and lightweight, Rust was an obvious choice. However, it wasn’t our first. Given Istio’s current extensive usage of Go, we had hoped we could make a Go-based implementation meet these goals. In initial prototypes, we built out some simple versions of both a Go-based implementation as well as a Rust-based one. From our tests, we found that the Go-based version didn’t meet our performance and footprint requirements. While it’s likely we could have optimized it further, we felt that a Rust-based proxy would give us the long-term optimal implementation.
A C++ implementation – likely reusing parts of Envoy – was also considered. However, this option was not pursued due to lack of memory safety, developer experience concerns, and a general industry trend towards Rust.
This process of elimination left us with Rust, which was a perfect fit. Rust has a strong history of success in high performance, low resource utilization applications, especially in network applications (including service mesh). We chose to build on top of the Tokio and Hyper libraries, two of the de-facto standards in the ecosystem that are extensively battle-tested and easy to write highly performant asynchronous code with.
A quick tour of the Rust-based ztunnel
Workload xDS configuration
The workload xDS configurations are very easy to understand and debug. You can view them by sending a request to localhost:15000/config_dump from one of your ztunnel pods, or use the convenient istioctl pc workload command. There are two key workload xDS configurations: workloads and policies.
Before your workloads are included in your ambient mesh, you will still be able to see them in ztunnel’s config dump, as ztunnel is aware of all of the workloads regardless of whether they are ambient enabled or not. For example, below contains a sample workload configuration for a newly deployed helloworld v1 pod which is out-of-mesh indicated by protocol: TCP:
{
  "workloads": {
    "10.244.2.8": {
      "workloadIp": "10.244.2.8",
      "protocol": "TCP",
      "name": "helloworld-v1-cross-node-55446d46d8-ntdbk",
      "namespace": "default",
      "serviceAccount": "helloworld",
      "workloadName": "helloworld-v1-cross-node",
      "workloadType": "deployment",
      "canonicalName": "helloworld",
      "canonicalRevision": "v1",
      "node": "ambient-worker2",
      "authorizationPolicies": [],
      "status": "Healthy"
    }
  }
}
After the pod is included in ambient (by labeling the namespace default with istio.io/dataplane-mode=ambient), the protocol value is replaced with HBONE, instructing ztunnel to upgrade all incoming and outgoing communications from the helloworld-v1 pod to be HBONE.
{
  "workloads": {
    "10.244.2.8": {
      "workloadIp": "10.244.2.8",
      "protocol": "HBONE",
      ...
}
After you deploy any workload level authorization policy, the policy configuration will be pushed as xDS configuration from Istiod to ztunnel and shown under policies:
{
  "policies": {
    "default/hw-viewer": {
      "name": "hw-viewer",
      "namespace": "default",
      "scope": "WorkloadSelector",
      "action": "Allow",
      "groups": [[[{
        "principals": [{"Exact": "cluster.local/ns/default/sa/sleep"}]
      }]]]
    }
  }
  ...
}
You’ll also notice the workload’s configuration is updated with reference to the authorization policy.
{
  "workloads": {
    "10.244.2.8": {
    "workloadIp": "10.244.2.8",
    ...
    "authorizationPolicies": [
        "default/hw-viewer"
    ],
  }
  ...
}
L4 telemetry provided by ztunnel
You may be pleasantly surprised that the ztunnel logs are easy to understand. For example, you’ll see the HTTP Connect request on the destination ztunnel that indicates the source pod IP (peer_ip) and destination pod IP.
2023-02-15T20:40:48.628251Z  INFO inbound{id=4399fa68cf25b8ebccd472d320ba733f peer_ip=10.244.2.5 peer_id=spiffe://cluster.local/ns/default/sa/sleep}: ztunnel::proxy::inbound: got CONNECT request to 10.244.2.8:5000
You can view L4 metrics of your workloads by accessing the localhost:15020/metrics API which provides the full set of TCP standard metrics, with same labels that sidecars expose. For example:
istio_tcp_connections_opened_total{
  reporter="source",
  source_workload="sleep",
  source_workload_namespace="default",
  source_principal="spiffe://cluster.local/ns/default/sa/sleep",
  destination_workload="helloworld-v1",
  destination_workload_namespace="default",
  destination_principal="spiffe://cluster.local/ns/default/sa/helloworld",
  request_protocol="tcp",
  connection_security_policy="mutual_tls"
  ...
} 1
If you install Prometheus and Kiali, you can view these metrics easily from Kiali’s UI.

    
        
            
        
    
    Kiali dashboard - L4 telemetry provided by ztunnel

Wrapping up
We are super excited that the new Rust-based ztunnel is drastically simplified, more lightweight and performant than the prior Envoy-based ztunnel. With the purposefully designed workload xDS for the Rust-based ztunnel, you’ll not only be able to understand the xDS configuration much more easily, but also have drastically reduced network traffic and cost between the Istiod control plane and ztunnels. With Istio ambient now merged to upstream master, you can try the new Rust-based ztunnel by following our getting started guide.



Announcing the Contribution Seat holders for 2023
Mon, 06 Feb 2023 00:00:00 +0000
The Istio Steering Committee consists of 9 Contribution Seats, proportionally allocated based on corporate contributions to the project, and 4 elected Community Seats.
Last year, we elected four members to the community seats. It’s now time to announce the companies who fuel our growth by selecting the Contribution Seat members. As per the Steering charter, every February we look at which companies have made the most contributions to Istio based on an annually agreed metric.
According to our seat allocation process, this year Google will be allocated 5 seats and IBM/Red Hat will be allocated 2. As the third largest contributor to Istio in the last 12 months, we are pleased to announce that Huawei has earned two Contribution Seats.
Based on this, here is the complete list of Istio Steering Committee members, including both the Contribution and Community Seats:

Ameer Abbas (Google)
Craig Box (ARMO)
Rob Cernich (Red Hat)
Iris Ding (Intel)
Cameron Etezadi (Google)
Jianpeng He (Huawei)
John Howard (Google)
Faseela K (Ericsson Software Technology)
April Kyle Nassi(Google)
Justin Pettit (Google)
Christian Posta (Solo.io)
Cale Rath (IBM)
Zhonghu Xu (Huawei)

Our sincerest thanks to Louis Ryan, Srihari Angaluri, Kebe Liu and Jason McGee, all long-time contributors to the Istio project, whose terms have come to an end.



Istio publishes results of 2022 security audit
Mon, 30 Jan 2023 00:00:00 +0000
Istio is a project that platform engineers trust to enforce security policy in their production Kubernetes environments. We pay a lot of care to security in our code, and maintain a robust vulnerability program. To validate our work, we periodically invite external review of the project, and we are pleased to publish the results of our second security audit.
The auditors’ assessment was that “Istio is a well-maintained project that has a strong and sustainable approach to security”. No critical issues were found; the highlight of the report was the discovery of a vulnerability in the Go programming language.
We would like to thank the Cloud Native Computing Foundation for funding this work, as a benefit offered to us after we joined the CNCF in August. It was arranged by OSTIF, and performed by ADA Logics.
Scope and overall findings
Istio received its first security assessment in 2020, with its data plane, the Envoy proxy, having been independently assessed in 2018 and 2021. The Istio Product Security Working Group and ADA Logics therefore decided on the following scope:

Produce a formal threat model, to guide this and future security audits
Carry out a manual code audit for security issues
Review the fixes for the issues found in the 2020 audit
Review and improve Istio’s fuzzing suite
Perform a SLSA review of Istio

Once again, no Critical issues were found in the review. The assessment found 11 security issues; two High, four Medium, four Low and one informational. All the reported issues have been fixed.

    
        
            
        
        “Istio is a very well-maintained and secure project with a sound code base, well-established security practices and a responsive product security team.” - ADA Logics

        
    


Aside from their observations above, the auditors note that Istio follows a high level of industry standards in dealing with security. In particular, they highlight that:

The Istio Product Security Working Group responds swiftly to security disclosures
The documentation on the project’s security is comprehensive, well-written and up to date
Security vulnerability disclosures follow industry standards and security advisories are clear and detailed
Security fixes include regression tests

Resolution and learnings
Request smuggling vulnerability in Go
The auditors uncovered a situation where Istio could accept traffic using HTTP/2 Over Cleartext (h2c), a method of making an unencrypted connection with HTTP/1.1 and then upgrading to HTTP/2. The Go library for h2c connections reads the entire request into memory, and notes that if you wish to avoid this, the request should be wrapped in a MaxBytesHandler.
In fixing this bug, Istio TOC member John Howard noticed that the recommended fix introduces a request smuggling vulnerability. The Go team thus published CVE-2022-41721 — the only vulnerability discovered by this audit!
Istio has since been changed to disable h2c upgrade support throughout.
Improvements to file fetching
The most common class of issue found were related to Istio fetching files over a network (for example, the Istio Operator installing Helm charts, or the WebAssembly module downloader):

A crafted Helm chart could exhaust disk space (#1) or overwrite other files in the Operator’s pod (#2)
File handles were not closed in the case of an error, and could be exhausted (#3)
Crafted files could exhaust memory  (#4 and #5)

To execute these code paths, an attacker would need enough privilege to either specify a URL for a Helm chart or a WebAssembly module.  With such access, they would not need an exploit: they could already cause an arbitrary chart to be installed to the cluster or an arbitrary WebAssembly module to be loaded into memory on the proxy servers.
The auditors and maintainers both note that the Operator is not recommended as a method of installation, as this requires a high-privilege controller to run in the cluster.
Other issues
The remaining issues found were:

In some testing code, or where a control plane component connects to another component over localhost, minimum TLS settings were not enforced (#6)
Operations that failed may not return error codes (#7)
A deprecated library was being used (#8)
TOC/TOU race conditions in a library used for copying files (#9)
A user could exhaust the memory of the Security Token Service if running in Debug mode (#11)

Please refer to the full report for details.
Reviewing the 2020 report
All 18 issues reported in Istio’s first security assessment were found to have been fixed.
Fuzzing
The OSS-Fuzz project helps open source projects perform free fuzz testing. Istio is integrated into OSS-Fuzz with 63 fuzzers running continuously: this support was built by ADA Logics and the Istio team in late 2021.

    
        
            
        
        "[We] started the fuzzing assessment by prioritizing security-critical parts of Istio. We found that many of these had impressive test coverage with little to no room for improvement." - ADA Logics

        
    


The assessment notes that “Istio benefits largely from having a substantial fuzz test suite that runs continuously on OSS-Fuzz”, and identified a few APIs in security-critical code that would benefit from further fuzzing, Six new fuzzers were contributed as a result of this work; by the end of the audit, the new tests had run over 3 billion times.
SLSA
Supply chain Levels for Software Artifacts (SLSA) is a check-list of standards and controls to prevent tampering, improve integrity, and secure software packages and infrastructure. It is organized into a series of levels that provide increasing integrity guarantees.
Istio does not currently generate provenance artifacts, so it does not meet the requirements for any SLSA levels.  Work on reaching SLSA Level 1 is currently underway. If you would like to get involved, please join the Istio Slack and reach out to our Test and Release working group.
Get involved
If you want to get involved with Istio product security, or become a maintainer, we’d love to have you! Join our public meetings to raise issues or learn about what we are doing to keep Istio secure.



Join us for Istio Day at KubeCon Europe 2023!
Fri, 27 Jan 2023 00:00:00 +0000
Istio is sailing up the canals this April! We are delighted to announce Istio Day Europe 2023, a “Day 0” event co-located with KubeCon + CloudNativeCon Europe 2023.

    
        
            
        
    
    

Istio Day is the perfect opportunity to meet the Istio maintainers and contributors in person, and hear from users why Istio is constantly ranked the #1 service mesh in production.
Submit a talk
We now encourage Istio users, developers, partners, and advocates to submit a session proposal through the CNCF event portal, which is open until February 12.
We want to see real world examples, case studies, and success stories that can inspire newcomers to use Istio in production. The content will cover introductory to advanced levels, split into four main topic tracks:

New Features: What have you been working on that the community should know about?
Case Studies: How have you built a platform or service on top of Istio?
Istio Recipes: How you can solve a specific business problem using Istio.
Project Updates: The evolution of Istio, and the latest updates from the project maintainers.

You can pick one of these formats to submit a session proposal:

Presentation: 25 minutes, 1 or 2 speaker(s) presenting a topic
Panel Discussion: 35 minutes of discussion among 3 to 5 speakers
Lightning Talk: A brief 5-minute presentation, maximum of 1 speaker

Accepted speakers will receive a complimentary All-Access In-Person ticket for all four days of KubeCon + CloudNativeCon.
Sponsor the event
Do you want to put your product or service in front of the most discerning Cloud Native users: those who demand 25% more conference than the crowd? Check out page 11 of the CNCF events prospectus to learn more.
Register to attend
Istio Day is a CNCF-hosted co-located event on 18 April 2023. KubeCon + CloudNativeCon Europe in-person attendees now have the option to buy an All-Access ticket, which includes entry to all the Day 0 events, as well as the main three days of the conference.  You must be attending KubeCon to attend Istio Day, but virtual registration options are available, and the recordings will be posted to YouTube soon after the event.
For those of you who can’t make it, keep your eyes peeled for announcements of IstioCon 2023 and Istio Day North America later this year.
Stay tuned to hear more about the event, and we hope you can join us at Istio Day Europe!



Getting started with the Kubernetes Gateway API
Wed, 14 Dec 2022 00:00:00 +0000
Whether you’re running your Kubernetes application services using Istio, or any service mesh for that matter,
or simply using ordinary services in a Kubernetes cluster, you need to provide access to your application services
for clients outside of the cluster. If you’re using plain Kubernetes clusters, you’re probably using
Kubernetes Ingress resources
to configure the incoming traffic. If you’re using Istio, you are more likely
to be using Istio’s recommended configuration resources,
Gateway and VirtualService,
to do the job.
The Kubernetes Ingress resource has for some time been known to have significant shortcomings, especially
when using it to configure ingress traffic for large applications and when working with protocols other
than HTTP. One problem is that it configures both
the client-side L4-L6 properties (e.g., ports, TLS, etc.) and service-side L7 routing in a single resource,
configurations that for large applications should be managed by different teams and in different namespaces.
Also, by trying to draw a common denominator across
different HTTP proxies, Ingress is only able to support the most basic HTTP routing and ends up pushing
every other feature of modern proxies into non-portable annotations.
To overcome Ingress’ shortcomings, Istio introduced its own configuration API for ingress traffic management.
With Istio’s API, the client-side representation is defined using an Istio Gateway resource, with L7 traffic
moved to a VirtualService, not coincidentally the same configuration resource used for routing traffic between
services inside the mesh. Although the Istio API provides a good solution for ingress traffic management
for large-scale applications, it is unfortunately an Istio-only API. If you are using a different service
mesh implementation, or no service mesh at all, you’re out of luck.
Enter Gateway API
There’s a lot of excitement surrounding a new Kubernetes traffic management API,
dubbed Gateway API, which has recently been
promoted to Beta.
Gateway API provides a set of Kubernetes configuration resources for ingress traffic control
that, like Istio’s API, overcomes the shortcoming of Ingress, but unlike Istio’s, is a standard Kubernetes
API with broad industry agreement. There are several implementations
of the API in the works, including a Beta implementation
in Istio, so now may be a good time to start thinking about how you can start moving your ingress
traffic configuration from Kubernetes Ingress or Istio Gateway/VirtualService to the new Gateway API.
Whether or not you use, or plan to use, Istio to manage your service mesh, the Istio implementation of the
Gateway API can easily be used to get started with your cluster ingress control.
Even though it’s still a Beta feature in Istio, mostly driven by the fact that the Gateway API is itself
still a Beta level API, Istio’s implementation is quite robust because under the covers it uses Istio’s
same tried-and-proven internal resources to implement the configuration.
Gateway API quick-start
To get started using the Gateway API, you need to first download the CRDs, which don’t come installed by default
on most Kubernetes clusters, at least not yet:
$ kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
  { kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.4.0" | kubectl apply -f -; }
Once the CRDs are installed, you can use them to create Gateway API resources to configure ingress traffic,
but in order for the resources to work, the cluster needs to have a gateway controller running.
You can enable Istio’s gateway controller implementation by simply installing Istio with the minimal profile:
$ curl -L https://istio.io/downloadIstio | sh -
$ cd istio-1.29.1
$ ./bin/istioctl install --set profile=minimal -y
Your cluster will now have a fully-functional implementation of the Gateway API,
via Istio’s gateway controller named istio.io/gateway-controller,
ready to use.
Deploy a Kubernetes target service
To try out the Gateway API, we’ll use the Istio helloworld sample
as an ingress target, but only running as a simple Kubernetes service
without sidecar injection enabled. Because we’re only going to use the Gateway API to control ingress traffic
into the “Kubernetes cluster”, it makes no difference if the target service is running inside or
outside of a mesh.
We’ll use the following command to deploy the helloworld service:
Zip$ kubectl create ns sample
$ kubectl apply -f @samples/helloworld/helloworld.yaml@ -n sample
The helloworld service includes two backing deployments, corresponding to different versions (v1 and v2).
We can confirm they are both running using the following command:
$ kubectl get pod -n sample
NAME                             READY   STATUS    RESTARTS   AGE
helloworld-v1-776f57d5f6-s7zfc   1/1     Running   0          10s
helloworld-v2-54df5f84b-9hxgww   1/1     Running   0          10s
Configure the helloworld ingress traffic
With the helloworld service up and running, we can now use the Gateway API to configure ingress traffic for it.
The ingress entry point is defined using a
Gateway resource:
$ kubectl create namespace sample-ingress
$ kubectl apply -f - <

The controller that will implement a Gateway is selected by referencing a
GatewayClass.
There must be at least one GatewayClass defined in the cluster to have functional Gateways.
In our case, we’re selecting Istio’s gateway controller, istio.io/gateway-controller, by referencing its
associated GatewayClass (named istio) with the gatewayClassName: istio setting in the Gateway.
Notice that unlike Ingress, a Kubernetes Gateway doesn’t include any references to the target service,
helloworld. With the Gateway API, routes to services are defined in separate configuration resources
that get attached to the Gateway to direct subsets of traffic to specific services,
like helloworld in our example. This separation allows us to define the Gateway and routes in
different namespaces, presumably managed by different teams. Here, while acting in the role of cluster
operator, we’re applying the Gateway in the sample-ingress namespace. We’ll add the route,
below, in the sample namespace, next to the helloworld service itself, on behalf of the application developer.
Because the Gateway resource is owned by a cluster operator, it can very well be used to provide ingress
for more than one team’s services, in our case more than just the helloworld service.
To emphasize this point, we’ve set hostname to *.sample.com in the Gateway,
allowing routes for multiple subdomains to be attached.
After applying the Gateway resource, we need to wait for it to be ready before retrieving its external address:
$ kubectl wait -n sample-ingress --for=condition=programmed gateway sample-gateway
$ export INGRESS_HOST=$(kubectl get -n sample-ingress gateway sample-gateway -o jsonpath='{.status.addresses[0].value}')
Next, we attach an HTTPRoute
to the sample-gateway (i.e., using the parentRefs field) to expose and route traffic to the helloworld service:
$ kubectl apply -n sample -f - <

Here we’ve exposed the /hello path of the helloworld service to clients outside of the cluster,
specifically via host helloworld.sample.com.
You can confirm the helloworld sample is accessible using curl:
$ for run in {1..10}; do curl -HHost:helloworld.sample.com http://$INGRESS_HOST/hello; done
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Since no version routing has been configured in the route rule, you should see an equal split of traffic,
about half handled by helloworld-v1 and the other half handled by helloworld-v2.
Configure weight-based version routing
Among other “traffic shaping” features, you can use Gateway API to send all of the traffic to one of the versions
or split the traffic based on request percentages. For example, you can use the following rule to distribute the
helloworld traffic 90% to v1, 10% to v2:
$ kubectl apply -n sample -f - <

Gateway API relies on version-specific backend service definitions for the route targets,
helloworld-v1 and helloworld-v2 in this example.
The helloworld sample already includes service definitions for the helloworld versions v1 and v2,
we just need to run the following command to define them:
Zip$ kubectl apply -n sample -f @samples/helloworld/gateway-api/helloworld-versions.yaml@
Now, we can run the previous curl commands again:
$ for run in {1..10}; do curl -HHost:helloworld.sample.com http://$INGRESS_HOST/hello; done
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
Hello version: v2, instance: helloworld-v2-54dddc5567-2lm7b
Hello version: v1, instance: helloworld-v1-78b9f5c87f-2sskj
This time we see that about 9 out of 10 requests are now handled by helloworld-v1 and only about 1 in 10 are handled by helloworld-v2.
Gateway API for internal mesh traffic
You may have noticed that we’ve been talking about the Gateway API only as an ingress configuration API,
often referred to as north-south traffic management, and not an API for service-to-service (aka, east-west)
traffic management within a cluster.
If you are using a service mesh, it would be highly desirable to use the same API
resources to configure both ingress traffic routing and internal traffic, similar to the way Istio uses
VirtualService to configure route rules for both. Fortunately, the Kubernetes Gateway API is working to
add this support.
Although not as mature as the Gateway API for ingress traffic, an effort
known as the Gateway API for Mesh Management and Administration (GAMMA)
initiative is underway to make this a reality and Istio intends to make Gateway API the default API for all
of its traffic management in the future.
The first significant Gateway Enhancement Proposal (GEP)
has recently been accepted and is, in-fact, already available to use in Istio.
To try it out, you’ll need to use the
experimental version
of the Gateway API CRDs, instead of the standard Beta version we installed above, but otherwise, you’re ready to go.
Check out the Istio request routing task
to get started.
Summary
In this article, we’ve seen how a light-weight minimal install of Istio can be used to provide a Beta-quality implementation
of the new Kubernetes Gateway API for cluster ingress traffic control. For Istio users, the Istio implementation also lets
you start trying out the experimental Gateway API support for east-west traffic management within the mesh.
Much of Istio’s documentation, including all of the ingress tasks
and several mesh-internal traffic management tasks, already includes parallel instructions for
configuring traffic using either the Gateway API or the Istio configuration API.
Check out the Gateway API task for more information about the
Gateway API implementation in Istio.



2022 Istio Steering Committee Election Results
Fri, 04 Nov 2022 00:00:00 +0000
The Istio Steering Committee consists of 9 proportionally-allocated Contribution Seats, and 4 elected Community Seats. Our third annual election for our Community Seats has concluded, and we are pleased to announce the choice of our members:

Craig Box (ARMO)
Iris Ding (Intel)
Faseela K (Ericsson Software Technology)
Christian Posta (Solo.io)

We would like to extend our heartfelt thanks to Zack Butcher, Lin Sun and Zhonghu Xu, whose terms have now ended.
With Contribution Seat holders from Google, IBM, Red Hat and DaoCloud, we have representation from 8 organizations on the Steering Committee, reflecting the breadth of our worldwide contributor ecosystem.
Thank you to everyone who participated in the election process, with special thanks to our election officers Josh Berkus, Cameron Etezadi and Ram Vennam.



Announcing Istio's acceptance as a CNCF project
Wed, 28 Sep 2022 00:00:00 +0000
We are pleased to share that Istio is now an official incubating CNCF project.
In April, Istio applied to become a CNCF project. Today, the TOC announced they have voted to accept our application.
This journey began with Istio’s inception in 2016. We are grateful for all who have collaborated over the last six years on Istio’s design, development, and deployment.
We especially appreciate the efforts of TOC sponsor Dave Zolotusky, TAG Network, and the engineering teams at Airbnb, Intuit, Splunk, and WP Engine for sharing their feedback as end users.
While project work continues uninterrupted, with the acceptance of Istio, we now will begin the processes of transferring trademarks and build infrastructure to CNCF ownership. We are hard at work on our upcoming 1.16 release, while continuing to collect feedback on ambient mesh and driving it to production readiness. Our project members are currently electing our community representatives to the Steering Committee for the next year.
As a CNCF project, we will now be much more visible at KubeCon NA in October. Come to the maintainer session, find us in the project pavilion, or grab an Istio t-shirt at the CNCF Store. Watch our Twitter throughout the conference for more exciting updates!



Ambient Mode Security Deep Dive
Wed, 07 Sep 2022 09:00:00 -0600
We recently announced Istio’s new ambient mode, which is a sidecar-less data plane for Istio and the reference implementation of the ambient mesh pattern. As stated in the announcement blog, the top concerns we address with ambient mesh are simplified operations, broader application compatibility, reduced infrastructure costs and improved performance. When designing the ambient data plane, we wanted to carefully balance the concerns around operations, cost, and performance while not sacrificing security or functionality. As the components of ambient mesh run outside of the application pods, the security boundaries have changed – we believe for the better. In this blog, we go into some detail about these changes and how they compare to a sidecar deployment.

    
        
            
        
    
    Layering of ambient mesh data plane

To recap, Istio’s ambient mode introduces a layered mesh data plane with a secure overlay responsible for transport security and routing, that has the option to add L7 capabilities for namespaces that need them.
To understand more, please see the announcement blog and the getting started blog.
The secure overlay consists of a node-shared component, the ztunnel, that is responsible for L4 telemetry and mTLS which is deployed as a DaemonSet.
The L7 layer of the mesh is provided by waypoint proxies, full L7 Envoy proxies that are deployed per identity/workload type.
Some of the core implications of this design include:

Separation of application from data plane
Components of the secure overlay layer resemble that of a CNI
Simplicity of operations is better for security
Avoiding multi-tenant L7 proxies
Sidecars are still a first-class supported deployment

Separation of application and data plane
Although the primary goal of ambient mesh is simplifying operations of the service mesh, it does serve to improve security as well. Complexity breeds vulnerabilities and enterprise applications (and their transitive dependencies, libraries, and frameworks) are exceedingly complex and prone to vulnerabilities. From handling complex business logic to leveraging OSS libraries or buggy internal shared libraries, a user’s application code is a prime target for attackers (internal or external). If an application is compromised, credentials, secrets, and keys are exposed to an attacker including those mounted or stored in memory. When looking at the sidecar model, an application compromise includes takeover of the sidecar and any associated identity/key material. In Istio’s ambient mode, no data plane components run in the same pod as the application and therefore an application compromise does not lead to the access of secrets.
What about Envoy Proxy as a potential target for vulnerabilities? Envoy is an extremely hardened piece of infrastructure under intense scrutiny and run at scale in critical environments (e.g., used in production to front Google’s network). However, since Envoy is software, it is not immune to vulnerabilities.  When those vulnerabilities do arise, Envoy has a robust CVE process for identifying them, fixing them quickly, and rolling them out to customers before they have the chance for wide impact.
Circling back to the earlier comment that “complexity breeds vulnerabilities”, the most complex parts of Envoy Proxy is in its L7 processing, and indeed historically the majority of Envoy’s vulnerabilities have been in its L7 processing stack. But what if you just use Istio for mTLS? Why take the risk of deploying a full-blown L7 proxy which has a higher chance of CVE when you don’t use that functionality? Separating L4 and L7 mesh capabilities comes into play here. While in sidecar deployments you adopt all of the proxy, even if you use only a fraction of the functionality, in ambient mode we can limit the exposure by providing a secure overlay and only layering in L7 as needed. Additionally, the L7 components run completely separate from the applications and do not give an attack avenue.
Pushing L4 down into the CNI
The L4 components of the ambient mode data plane run as a DaemonSet, or one per node. This means it is shared infrastructure for any of the pods running on a particular node. This component is particularly sensitive and should be treated at the same level as any other shared component on the node such as any CNI agents, kube-proxy, kubelet, or even the Linux kernel. Traffic from workloads is redirected to the ztunnel which then identifies the workload and selects the right certificates to represent that workload in a mTLS connection.
The ztunnel uses a distinct credential for every pod which is only issued if the pod is currently running on the node. This ensures that the blast radius for a compromised ztunnel is that only credentials for pods currently scheduled on that node could be stolen. This is a similar property to other well implemented shared node infrastructure including other secure CNI implementations. The ztunnel does not use cluster-wide or per-node credentials which, if stolen, could immediately compromise all application traffic in the cluster unless a complex secondary authorization mechanism is also implemented.
If we compare this to the sidecar model, we notice that the ztunnel is shared and compromise could result in exfiltration of the identities of the applications running on the node. However, the likelihood of a CVE in this component is lower than that of an Istio sidecar since the attack surface is greatly reduced (only L4 handling); the ztunnel does not do any L7 processing. In addition, a CVE in a sidecar (with a larger attack surface with L7) is not truly contained to only that particular workload which is compromised. Any serious CVE in a sidecar is likely repeatable to any of the workloads in the mesh as well.
Simplicity of operations is better for security
Ultimately, Istio is a critical piece of infrastructure that must be maintained. Istio is trusted to implement some of the tenets of zero-trust network security on behalf of applications and rolling out patches on a schedule or on demand is paramount. Platform teams often have predictable patching or maintenance cycles which is quite different from that of applications. Applications likely get updated when new capabilities and functionality are required and usually part of a project. This approach to application changes, upgrades, framework and library patches, is highly unpredictable, allows a lot of time to pass, and does not lend itself to safe security practices. Therefore, keeping these security features part of the platform and separate from the applications is likely to lead to a better security posture.
As we’ve identified in the announcement blog, operating sidecars can be more complex because of the invasive nature of them (injecting the sidecar/changing the deployment descriptors, restarting the applications, race conditions between containers, etc). Upgrades to workloads with sidecars require a bit more planning and rolling restarts that may need to be coordinated to not bring down the application. With ambient mode, upgrades to the ztunnel can coincide with any normal node patching or upgrades, while the waypoint proxies are part of the network and can be upgraded completely transparently to the applications as needed.
Avoiding multi-tenant L7 proxies
Supporting L7 protocols such as HTTP 1/2/3, gRPC, parsing headers, implementing retries, customizations with Wasm and/or Lua in the data plane is significantly more complex than supporting L4. There is a lot more code to implement these behaviors (including user-custom code for things like Lua and Wasm) and this complexity can lead to the potential for vulnerabilities. Because of this, CVEs have a higher chance of being discovered in these areas of L7 functionality.

    
        
            
        
    
    Each namespace/identity has its own L7 proxies; no multi-tenant proxies

In ambient mode, we do not share L7 processing in a proxy across multiple identities. Each identity (service account in Kubernetes) has its own dedicated L7 proxy (waypoint proxy) which is very similar to the model we use with sidecars. Trying to co-locate multiple identities and their distinct complex policies and customizations adds a lot of variability to a shared resource which leads to unfair cost attribution at best and total proxy compromise at worst.
Sidecars are still a first-class supported deployment
We understand that some folks are comfortable with the sidecar model and their known security boundaries and wish to stay on that model. With Istio, sidecars are a first-class citizen to the mesh and platform owners have the choice to continue using them. If a platform owner wants to support both sidecar and ambient modes, they can. A workload with the ambient data plane can natively communicate with workloads that have a sidecar deployed. As folks better understand the security posture of ambient mode, we are confident that it will be the preferred data plane mode of Istio, with sidecars used for specific optimizations.



Get Started with Istio Ambient Mesh
Wed, 07 Sep 2022 08:00:00 -0600

    
        
            
        
        Refer to the latest getting started with ambient mesh doc for updated instructions.
    


Ambient mesh is a new data plane mode for Istio introduced today. Following this getting started guide, you can experience how ambient mesh can simplify your application onboarding, help with ongoing operations, and reduce service mesh infrastructure resource usage.
Install Istio with Ambient Mode

Download the preview version of Istio with support for ambient mesh.
Check out supported environments. We recommend using a Kubernetes cluster that is version 1.21 or newer that has two nodes or more. If you don’t have a Kubernetes cluster, you can set up using locally (e.g. using kind as below) or deploy one in Google or AWS Cloud:

$ kind create cluster --config=- <

The ambient profile is designed to help you get started with ambient mesh.
Install Istio with the ambient profile on your Kubernetes cluster, using the istioctl downloaded above:
$ istioctl install --set profile=ambient
After running the above command, you’ll get the following output that indicates these four components are installed successfully!
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ CNI installed
✔ Installation complete
By default, the ambient profile has the Istio core, Istiod, ingress gateway, zero-trust tunnel agent (ztunnel) and CNI plugin enabled.
The Istio CNI plugin is responsible for detecting which application pods are part of the ambient mesh and configuring the traffic redirection between the ztunnels.
You’ll notice the following pods are installed in the istio-system namespace with the default ambient profile:
$ kubectl get pod -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istio-cni-node-97p9l                    1/1     Running   0          29s
istio-cni-node-rtnvr                    1/1     Running   0          29s
istio-cni-node-vkqzv                    1/1     Running   0          29s
istio-ingressgateway-5dc9759c74-xlp2j   1/1     Running   0          29s
istiod-64f6d7db7c-dq8lt                 1/1     Running   0          47s
ztunnel-bq6w2                           1/1     Running   0          47s
ztunnel-tcn4m                           1/1     Running   0          47s
ztunnel-tm9zl                           1/1     Running   0          47s
The istio-cni and ztunnel components are deployed as Kubernetes DaemonSets which run on every node.
Each Istio CNI pod checks all pods co-located on the same node to see if these pods are part of the ambient mesh.
For those pods, the CNI plugin configures traffic redirection so that all incoming and outgoing traffic to the pods are redirected to the co-located ztunnel first.
As new pods are deployed or removed on the node, CNI plugin continues to monitor and update the redirection logic accordingly.
Deploy Your Applications
You’ll use the sample bookinfo application, which is part of your Istio download from previous steps.
In ambient mode, you deploy applications to your Kubernetes cluster exactly the same way you would without Istio.
This means you can have your applications running in your Kubernetes before you enable ambient mesh, and have them join the mesh without needing to restart or reconfigure your applications.
$ kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
$ kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
$ kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml

    
        
            
        
    
    Applications not in the ambient mesh with plain text traffic

Note: sleep and notsleep are two simple applications that can serve as curl clients.
Connect productpage to the Istio ingress gateway so you can access the bookinfo app from outside of the cluster:
$ kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
Test your bookinfo application, it should work with or without the gateway. Note: you can replace istio-ingressgateway.istio-system below with its load balancer IP (or hostname) if it has one:
$ kubectl exec deploy/sleep -- curl -s http://istio-ingressgateway.istio-system/productpage | head -n1
$ kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
$ kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1
Adding your application to the ambient mesh
You can enable all pods in a given namespace to be part of the ambient mesh by simply labeling the namespace:
$ kubectl label namespace default istio.io/dataplane-mode=ambient
Congratulations! You have successfully added all pods in the default namespace to the ambient mesh. The best part is that there is no need to restart or redeploy anything!
Send some test traffic:
$ kubectl exec deploy/sleep -- curl -s http://istio-ingressgateway.istio-system/productpage | head -n1
$ kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
$ kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1
You’ll immediately gain mTLS communication among the applications in the Ambient mesh.

    
        
            
        
    
    Inbound requests from sleep to `productpage` and from `productpage` to reviews with secure overlay layer

If you are curious about the X.509 certificate for each identity, you can learn more about it by stepping through a certificate:
$ istioctl pc secret ds/ztunnel -n istio-system -o json | jq -r '.dynamicActiveSecrets[0].secret.tlsCertificate.certificateChain.inlineBytes' | base64 --decode | openssl x509 -noout -text -in /dev/stdin
For example, the output shows the certificate for the sleep principle that is valid for 24 hours, issued by the local Kubernetes cluster.
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 307564724378612391645160879542592778778 (0xe762cfae32a3b8e3e50cb9abad32b21a)
    Signature Algorithm: SHA256-RSA
        Issuer: O=cluster.local
        Validity
            Not Before: Aug 29 21:00:14 2022 UTC
            Not After : Aug 30 21:02:14 2022 UTC
        Subject:
        Subject Public Key Info:
            Public Key Algorithm: RSA
                Public-Key: (2048 bit)
                Modulus:
                    ac:db:1a:77:72:8a:99:28:4a:0c:7e:43:fa:ff:35:
                    75:aa:88:4b:80:4f:86:ca:69:59:1c:b5:16:7b:71:
                    dd:74:57:e2:bc:cf:ed:29:7d:7b:fa:a2:c9:06:e6:
                    d6:41:43:2a:3c:2c:18:8e:e8:17:f6:82:7a:64:5f:
                    c4:8a:a4:cd:f1:4a:9c:3f:e0:cc:c5:d5:79:49:37:
                    30:10:1b:97:94:2c:b7:1b:ed:a2:62:d9:3b:cd:3b:
                    12:c9:b2:6c:3c:2c:ac:54:5b:a7:79:97:fb:55:89:
                    ca:08:0e:2e:2a:b8:d2:e0:3b:df:b2:21:99:06:1b:
                    60:0d:e8:9d:91:dc:93:2f:7c:27:af:3e:fc:42:99:
                    69:03:9c:05:0b:c2:11:25:1f:71:f0:8a:b1:da:4a:
                    da:11:7c:b4:14:df:6e:75:38:55:29:53:63:f5:56:
                    15:d9:6f:e6:eb:be:61:e4:ce:4b:2a:f9:cb:a6:7f:
                    84:b7:4c:e4:39:c1:4b:1b:d4:4c:70:ac:98:95:fe:
                    3e:ea:5a:2c:6c:12:7d:4e:24:ab:dc:0e:8f:bc:88:
                    02:f2:66:c9:12:f0:f7:9e:23:c9:e2:4d:87:75:b8:
                    17:97:3c:96:83:84:3f:d1:02:6d:1c:17:1a:43:ce:
                    68:e2:f3:d7:dd:9e:a6:7d:d3:12:aa:f5:62:91:d9:
                    8d
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                Server Authentication, Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier:
                keyid:93:49:C1:B8:AB:BF:0F:7D:44:69:5A:C3:2A:7A:3C:79:19:BE:6A:B7
            X509v3 Subject Alternative Name: critical
                URI:spiffe://cluster.local/ns/default/sa/sleep
Note: If you don’t get any output, it may mean ds/ztunnel has selected a node that doesn’t manage any certificates. You can specify a specific ztunnel pod (e.g. istioctl pc secret ztunnel-tcn4m -n istio-system) that manages either one of the sample application pods instead.
Secure application access
After you have added your application to ambient mesh, you can secure application access using L4 authorization policies.
This lets you control access to and from a service based on client workload identities, but not at the L7 level, such as HTTP methods like GET and POST.
L4 Authorization Policies
Explicitly allow the sleep service account and istio-ingressgateway service accounts to call the productpage service:
$ kubectl apply -f - <

Confirm the above authorization policy is working:
$ # this should succeed
$ kubectl exec deploy/sleep -- curl -s http://istio-ingressgateway.istio-system/productpage | head -n1
$ # this should succeed
$ kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
$ # this should fail with an empty reply
$ kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1
Layer 7 Authorization Policies
Using the Kubernetes Gateway API, you can deploy a waypoint proxy for the productpage service that uses the bookinfo-productpage service account. Any traffic going to the productpage service will be mediated, enforced and observed by the Layer 7 (L7) proxy.
$ kubectl apply -f - <

Note the gatewayClassName has to be istio-mesh for the waypoint proxy.
View the productpage waypoint proxy status; you should see the details of the gateway resource with Ready status:
$ kubectl get gateway productpage -o yaml
...
status:
  conditions:
  - lastTransitionTime: "2022-09-06T20:24:41Z"
    message: Deployed waypoint proxy to "default" namespace for "bookinfo-productpage"
      service account
    observedGeneration: 1
    reason: Ready
    status: "True"
    type: Ready
Update our AuthorizationPolicy to explicitly allow the sleep service account and istio-ingressgateway service accounts to GET the productpage service, but perform no other operations:
$ kubectl apply -f - <

Confirm the above authorization policy is working:
$ # this should fail with an RBAC error because it is not a GET operation
$ kubectl exec deploy/sleep -- curl -s http://productpage:9080/ -X DELETE | head -n1
$ # this should fail with an RBAC error because the identity is not allowed
$ kubectl exec deploy/notsleep -- curl -s http://productpage:9080/  | head -n1
$ # this should continue to work
$ kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1

    
        
            
        
    
    Inbound requests from sleep to `productpage` and from `productpage` to reviews with secure overlay and L7 processing layers

With the productpage waypoint proxy deployed, you’ll also automatically get L7 metrics for all requests to the productpage service:
$ kubectl exec deploy/bookinfo-productpage-waypoint-proxy -- curl -s http://localhost:15020/stats/prometheus | grep istio_requests_total
You’ll notice the metric with response_code=403 and some metrics response_code=200, like below:
istio_requests_total{
  response_code="403",
  source_workload="notsleep",
  source_workload_namespace="default",
  source_principal="spiffe://cluster.local/ns/default/sa/notsleep",
  destination_workload="productpage-v1",
  destination_principal="spiffe://cluster.local/ns/default/sa/bookinfo-productpage",
  connection_security_policy="mutual_tls",
  ...
}
The metric shows two 403 responses when the source workload (notsleep) calls the destination workload(productpage-v1) along with source and destination principals via mutual TLS connection.
Control Traffic
Deploy a waypoint proxy for the review service, using the bookinfo-review service account, so that any traffic going to the review service will be mediated by the waypoint proxy.
$ kubectl apply -f - <

Apply the reviews virtual service to control 90% traffic to reviews v1 and 10% traffic to reviews v2.
$ kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-90-10.yaml
$ kubectl apply -f samples/bookinfo/networking/destination-rule-reviews.yaml
Confirm that roughly 10% traffic from the 100 requests go to reviews-v2:
$ kubectl exec -it deploy/sleep -- sh -c 'for i in $(seq 1 100); do curl -s http://istio-ingressgateway.istio-system/productpage | grep reviews-v.-; done'
Wrapping up
The existing Istio resources continue to work, regardless if you choose to use the sidecar or ambient data plane mode.
Take a look at the short video to watch Lin run through the Istio ambient mesh demo in 5 minutes:

What’s next
We are super excited about the new Istio ambient data plane with its simple “ambient” architecture. Onboarding your applications onto a service mesh with ambient mode is now as easy as labeling a namespace. Your applications will gain instant benefits such as mTLS with cryptographic identity for mesh traffic and L4 observability. If you need to control access or routes or increase resiliency or gain L7 metrics among your applications in ambient mesh, you can apply waypoint proxies to your applications as needed. We’re big fans of paying for only what we need, as it not only saves resources but also saves operation cost from constantly updating many proxies! We invite you to try the new Istio ambient data plane architecture to experience how simple it is. We look forward to your feedback in the Istio community!



Introducing Ambient Mesh
Wed, 07 Sep 2022 07:00:00 -0600

    
        
            
        
        Ambient mode is now generally available!
    


Today, we are excited to introduce “ambient mesh”, and its reference implementation: a new Istio data plane mode that’s designed for simplified operations, broader application compatibility, and reduced infrastructure cost. Ambient mesh gives users the option to forgo sidecar proxies in favor of a data plane that’s integrated into their infrastructure, all while maintaining Istio’s core features of zero-trust security, telemetry, and traffic management. We are sharing a preview of ambient mesh with the Istio community that we are working to bring to production readiness in the coming months.
Istio and sidecars
Since its inception, a defining feature of Istio’s architecture has been the use of sidecars – programmable proxies deployed alongside application containers.  Sidecars allow operators to reap Istio’s benefits, without requiring applications to undergo major surgery and its associated costs.

    
        
            
        
    
    Istio’s traditional model deploys Envoy proxies as sidecars within the workloads’ pods

Although sidecars have significant advantages over refactoring applications, they do not provide a perfect separation between applications and the Istio data plane. This results in a few limitations:

Invasiveness - Sidecars must be “injected” into applications by modifying their Kubernetes pod spec and redirecting traffic within the pod.   As a result, installing or upgrading sidecars requires restarting the application pod, which can be disruptive for workloads.
Underutilization of resources - Since the sidecar proxy is dedicated to its associated workload, the CPU and memory resources must be provisioned for worst case usage of each individual pod. This adds up to large reservations that can lead to underutilization of resources across the cluster.
Traffic breaking - Traffic capture and HTTP processing, as typically done by Istio’s sidecars, is computationally expensive and can break some applications with non-conformant HTTP implementations.

While sidecars have their place — more on that later — we think there is a need for a less invasive and easier option that will be a better fit for many service mesh users.
Slicing the layers
Traditionally, Istio implements all data plane functionality, from basic encryption through advanced L7 policy, in a single architectural component: the sidecar.
In practice, this makes sidecars an all-or-nothing proposition.
Even if a workload just needs simple transport security, administrators still need to pay the operational cost of deploying and maintaining a sidecar.
Sidecars have a fixed operational cost per workload that does not scale to fit the complexity of the use case.
The ambient data plane takes a different approach.
It splits Istio’s functionality into two distinct layers.
At the base, there’s a secure overlay that handles routing and zero trust security for traffic.
Above that, when needed, users can enable L7 processing to get access to the full range of Istio features.
The L7 processing mode, while heavier than the secure overlay, still runs as an ambient component of the infrastructure, requiring no modifications to application pods.

    
        
            
        
    
    Layers of the ambient mesh

This layered approach allows users to adopt Istio in a more incremental fashion, smoothly transitioning from no mesh, to the secure overlay, to full L7 processing — on a per-namespace basis, as needed.  Furthermore, workloads running in different ambient layers, or with sidecars, interoperate seamlessly, allowing users to mix and match capabilities based on the particular needs as they change over time.
Building an ambient mesh
Istio’s ambient data plane mode uses a shared agent, running on each node in the Kubernetes cluster.  This agent is a zero-trust tunnel (or ztunnel), and its primary responsibility is to securely connect and authenticate elements within the mesh.  The networking stack on the node redirects all traffic of participating workloads through the local ztunnel agent. This fully separates the concerns of Istio’s data plane from those of the application, ultimately allowing operators to enable, disable, scale, and upgrade the data plane without disturbing applications. The ztunnel performs no L7 processing on workload traffic, making it significantly leaner than sidecars.  This large reduction in complexity and associated resource costs make it amenable to delivery as shared infrastructure.
Ztunnels enable the core functionality of a service mesh: zero trust.  A secure overlay is created when ambient mode is enabled for a namespace.  It provides workloads with mTLS, telemetry, authentication, and L4 authorization, without terminating or parsing HTTP.

    
        
            
        
    
    Ambient mesh uses a shared, per-node ztunnel to provide a zero-trust secure overlay

After ambient mode is enabled and a secure overlay is created, a namespace can be configured to utilize L7 features.
This allows a namespace to implement the full set of Istio capabilities, including the Virtual Service API, L7 telemetry, and L7 authorization policies.
Namespaces operating in this mode use one or more Envoy-based waypoint proxies to handle L7 processing for workloads in that namespace.
Istio’s control plane configures the ztunnels in the cluster to pass all traffic that requires L7 processing through the waypoint proxy.
Importantly, from a Kubernetes perspective, waypoint proxies are just regular pods that can be auto-scaled like any other Kubernetes deployment.
We expect this to yield significant resource savings for users, as the waypoint proxies can be auto-scaled to fit the real time traffic demand of the namespaces they serve, not the maximum worst-case load operators expect.

    
        
            
        
    
    When additional features are needed, ambient mesh deploys waypoint proxies, which ztunnels connect through for policy enforcement

Ambient mesh uses HTTP CONNECT over mTLS to implement its secure tunnels and insert waypoint proxies in the path, a pattern we call HBONE (HTTP-Based Overlay Network Environment). HBONE provides for a cleaner encapsulation of traffic than TLS on its own while enabling interoperability with common load-balancer infrastructure. FIPS builds are used by default to meet compliance needs. More details on HBONE, its standards-based approach, and plans for UDP and other non-TCP protocols will be provided in a future blog.
Mixing sidecar and ambient modes in a single mesh does not introduce limitations on the capabilities or security properties of the system. The Istio control plane ensures that policies are properly enforced regardless of the deployment model chosen. Ambient mode simply introduces an option that has better ergonomics and more flexibility.
Why no L7 processing on the local node?
Ambient mode uses a shared ztunnel agent on the node, which handles the zero trust aspects of the mesh, while L7 processing happens in the waypoint proxy in separately scheduled pods. Why bother with the indirection, and not just use a shared full L7 proxy on the node?  There are several reasons for this:

Envoy is not inherently multi-tenant. As a result, we have security concerns with commingling complex processing rules for L7 traffic from multiple unconstrained tenants in a shared instance. By strictly limiting to L4 processing, we reduce the vulnerability surface area significantly.
The mTLS and L4 features provided by the ztunnel need a much smaller CPU and memory footprint when compared to the L7 processing required in the waypoint proxy. By running waypoint proxies as a shared namespace resource, we can scale them independently based on the needs of that namespace, and its costs are not unfairly distributed across unrelated tenants.
By reducing ztunnel’s scope we allow for it to be replaced by other secure tunnel implementations that can meet a well-defined interoperability contract.

But what about those extra hops?
With ambient mode, a waypoint isn’t necessarily guaranteed to be on the same node as the workloads it serves. While at first glance this may appear to be a performance concern, we’re confident that latency will ultimately be in-line with Istio’s current sidecar implementation. We’ll discuss more in a dedicated performance blog post, but for now we’ll summarize with two points:

The majority of Istio’s network latency does not, in fact, come from the network (modern cloud providers have extremely fast networks).  Instead the biggest culprit is the intensive L7 processing Istio needs to implement its sophisticated feature set.  Unlike sidecars, which implement two L7 processing steps for each connection (one for each sidecar), ambient mode collapses these two steps into one.  In most cases, we expect this reduced processing cost to compensate for an additional network hop.
Users often deploy a mesh to enable a zero-trust security posture as a first-step and then selectively enable L7 capabilities as needed.  Ambient mode allows those users to bypass the cost of L7 processing entirely when it’s not needed.

Resource overhead
Overall we expect Istio’s ambient mode to have fewer and more predictable resource requirements for most users.
The ztunnel’s limited responsibilities allows it to be deployed as a shared resource on the node.
This will substantially reduce the per-workload reservations required for most users.
Furthermore, since the waypoint proxies are normal Kubernetes pods, they can be dynamically deployed and scaled based on the real-time traffic demands of the workloads they serve.
Sidecars, on the other hand, need to reserve memory and CPU for the worst case for each workload.
Making these calculations are complicated, so in practice administrators tend to over-provision.
This leads to underutilized nodes due to high reservations that prevent other workloads from being scheduled.
Ambient mode’s lower fixed per-node overhead and dynamically scaled waypoint proxies will require far fewer resource reservations in aggregate, leading to more efficient use of a cluster.
What about security?
With a radically new architecture naturally comes questions around security.  The ambient mode security blog does a deep dive, but we’ll summarize here.
Sidecars co-locate with the workloads they serve and as a result, a vulnerability in one compromises the other.
In the ambient mesh model, even if an application is compromised, the ztunnels and waypoint proxies can still enforce strict security policy on the compromised application’s traffic.
Furthermore, given that Envoy is a mature battle-tested piece of software used by the world’s largest network operators, it is likely less vulnerable than the applications it runs alongside.
While the ztunnel is a shared resource, it only has access to the keys of the workloads currently on the node it’s running.
Thus, its blast radius is no worse than any other encrypted CNI that relies on per-node keys for encryption.
Also, given the ztunnel’s limited L4 only attack surface area and Envoy’s aforementioned security properties, we feel this risk is limited and acceptable.
Finally, while the waypoint proxies are a shared resource, they can be limited to serving just one service account.
This makes them no worse than sidecars are today; if one waypoint proxy is compromised, the credential associated with that waypoint is lost, and nothing else.
Is this the end of the road for the sidecar?
Definitely not.
While we believe ambient mesh will be the best option for many mesh users going forward, sidecars continue to be a good choice for those that need dedicated data plane resources, such as for compliance or performance tuning.
Istio will continue to support sidecars, and importantly, allow them to interoperate seamlessly with ambient mode.
In fact, the ambient mode code we’re releasing today already supports interoperation with sidecar-based Istio.
Learn more
Take a look at a short video to watch Christian run through the Istio ambient mode components and demo some capabilities:

Get involved
What we have released today is an early version of ambient mode in Istio, and it is very much still under active development. We are excited to share it with the broader community and look forward to getting more people involved in shaping it as we move to production readiness in 2023.
We would love your feedback to help shape the solution.
A build of Istio which supports ambient mode is available to download and try in the Istio Experimental repo.
A list of missing features and work items is available in the README.
Please try it out and let us know what you think!
Thank you to the team that contributed to the launch of ambient mesh!

Google: Craig Box, John Howard, Ethan J. Jackson, Abhi Joglekar, Steven Landow, Oliver Liu, Justin Pettit, Doug Reid, Louis Ryan, Kuat Yessenov, Francis Zhou
Solo.io: Aaron Birkland, Kevin Dorosh, Greg Hanson, Daniel Hawton, Denis Jannot, Yuval Kohavi, Idit Levine, Yossi Mesika, Neeraj Poddar, Nina Polshakova, Christian Posta, Lin Sun, Eitan Yarmush




Extending Gateway API support in Istio
Wed, 13 Jul 2022 00:00:00 +0000
Today we want to congratulate the Kubernetes SIG Network community on the beta release of the Gateway API specification. Alongside this milestone, we are pleased to announce that support for using the Gateway API in Istio ingress is being promoted to Beta, and our intention for the Gateway API to become the default API for all Istio traffic management in the future. We are also excited to welcome our friends from the Service Mesh Interface (SMI) community, who are joining us in a new effort to standardize service mesh use cases using the Gateway API.
The history of Istio’s traffic management APIs
API design is more of an art than a science, and Istio is often used as an API to configure the serving of other APIs! In the case of traffic routing alone, we must consider producer vs consumer, routing vs. post-routing, and how to express a complex feature set with the correct number of objects — factoring in that these must be owned by different teams.
When we launched Istio in 2017, we brought many years of experience from Google’s production API serving infrastructure and IBM’s Amalgam8 project, and mapped it onto Kubernetes. We soon came up against the limitations of Kubernetes’ Ingress API. A desire to support all proxy implementations meant that Ingress only supported the most basic of HTTP routing features, with other features often implemented as vendor-specific annotations. The Ingress API was shared between infrastructure admins (“create and configure a load balancer”), cluster operators (“manage a TLS certificate for my entire domain”) and application users (“use it to route /foo to the foo service”).
We rewrote our traffic APIs in early 2018 to address user feedback, and to more adequately address these concerns.
A primary feature of Istio’s new model was having separate APIs that describe infrastructure (the load balancer, represented by the Gateway), and application (routing and post-routing, represented by the VirtualService and DestinationRule).
Ingress worked well as a lowest common denominator between different implementations, but its shortcomings led SIG Network to investigate the design of a “version 2”. A user survey in 2018 was followed by a proposal for new APIs in 2019, based in large part on Istio’s traffic APIs. That effort came to be known as the “Gateway API”.
The Gateway API was built to be able to model many more use cases, with extension points to enable functionality that differs between implementations. Furthermore, adopting the Gateway API opens a service mesh up to compatibility with the whole ecosystem of software that is written to support it. You don’t have to ask your vendor to support Istio routing directly: all they need to do is create Gateway API objects, and Istio will do what it needs to do, out of the box.
Support for the Gateway API in Istio
Istio added support for the Gateway API in November 2020, with support marked Alpha along with the API implementation. With the Beta release of the API spec we are pleased to announce support for ingress use in Istio is being promoted to Beta. We also encourage early adopters to start experimenting with the Gateway API for mesh (service-to-service) use, and we will move that support to Beta when SIG Network has standardized the required semantics.
Around the time of the v1 release of the API, we intend to make the Gateway API the default method for configuring all traffic routing in Istio - for ingress (north-south) and service-to-service (east-west). At that time, we will change our documentation and examples to reflect the recommendation.
Just like Kubernetes intends to support the Ingress API for many years after the Gateway API goes stable, the Istio APIs (Gateway, VirtualService and DestinationRule) will remain supported for the foreseeable future.
Not only that, but you can continue to use the existing Istio traffic APIs alongside the Gateway API, for example, using an HTTPRoute with an Istio VirtualService.
The similarity between the APIs means that we will be able to offer a tool to easily convert Istio API objects to Gateway API objects, and we will release this alongside the v1 version of the API.
Other parts of Istio functionality, including policy and telemetry, will continue to be configured using Istio-specific APIs while we work with SIG Network on standardization of these use cases.
Welcoming the SMI community to the Gateway API project
Throughout its design and implementation, members of the Istio team have been working with members of SIG Network on the implementation of the Gateway API, making sure the API was suitable for use in mesh use cases.
We are delighted to be formally joined in this effort by members of the Service Mesh Interface (SMI) community, including leaders from Linkerd, Consul and Open Service Mesh, who have collectively decided to standardize their API efforts on the Gateway API. To that end, we have set up a Gateway API Mesh Management and Administration (GAMMA) workstream within the Gateway API project. John Howard, a member of the Istio Technical Oversight Committee and a lead of our Networking WG, will be a lead of this group.
Our combined next steps are to provide enhancement proposals to the Gateway API project to support mesh use cases. We have started looking at API semantics for mesh traffic management, and will work with vendors and communities implementing Gateway API in their projects to build on a standard implementation. After that, we intend to build a representation for authorization and authentication policy.
With SIG Network as a vendor neutral forum for ensuring the service mesh community implements the Gateway API using the same semantics, we look forward to having a standard API which works with all projects, regardless of their technology stack or proxy.



CryptoMB - TLS handshake acceleration for Istio
Wed, 15 Jun 2022 00:00:00 +0000
Cryptographic operations are among the most compute-intensive and critical operations when it comes to secured connections. Istio uses Envoy as the “gateways/sidecar” to handle secure connections and intercept the traffic.
Depending upon use cases, when an ingress gateway must handle a large number of incoming TLS and secured service-to-service connections through sidecar proxies, the load on Envoy increases. The potential performance depends on many factors, such as size of the cpuset on which Envoy is running, incoming traffic patterns, and key size. These factors can impact Envoy serving many new incoming TLS requests. To achieve performance improvements and accelerated handshakes, a new feature was introduced in Envoy 1.20 and Istio 1.14. It can be achieved with 3rd Gen Intel® Xeon® Scalable processors, the Intel® Integrated Performance Primitives (Intel® IPP) crypto library, CryptoMB Private Key Provider Method support in Envoy, and Private Key Provider configuration in Istio using ProxyConfig.
CryptoMB
The Intel IPP crypto library supports multi-buffer crypto operations. Briefly, multi-buffer cryptography is implemented with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions using a SIMD (single instruction, multiple data) mechanism. Up to eight RSA or ECDSA operations are gathered into a buffer and processed at the same time, providing potentially improved performance. Intel AVX-512 instructions are available on recently launched 3rd generation Intel Xeon Scalable processor server processors (Ice Lake server).
The idea of Envoy’s CryptoMB private key provider is that incoming TLS handshakes’ RSA operations are accelerated using Intel AVX-512 multi-buffer instructions.
Accelerate Envoy with Intel AVX-512 instructions
Envoy uses BoringSSL as the default TLS library. BoringSSL supports setting private key methods for offloading asynchronous private key operations, and Envoy implements a private key provider framework to allow creation of Envoy extensions which handle TLS handshakes private key operations (signing and decryption) using the BoringSSL hooks.
CryptoMB private key provider is an Envoy extension which handles BoringSSL TLS RSA operations using Intel AVX-512 multi-buffer acceleration. When a new handshake happens, BoringSSL invokes the private key provider to request the cryptographic operation, and then the control returns to Envoy. The RSA requests are gathered in a buffer. When the buffer is full or the timer expires, the private key provider invokes Intel AVX-512 processing of the buffer. When processing is done, Envoy is notified that the cryptographic operation is done and that it may continue with the handshakes.

    
        
             BoringSSL <-> PrivateKeyProvider" />
        
    
    Envoy <-> BoringSSL <-> PrivateKeyProvider

The Envoy worker thread has a buffer size for eight RSA requests. When the first RSA request is stored in the buffer, a timer will be initiated (timer duration is set by the poll_delay field in the CryptoMB configuration).

    
        
            
        
    
    Buffer timer started

When the buffer is full or when the timer expires, the crypto operations are performed for all RSA requests simultaneously. The SIMD (single instruction, multiple data) processing gives the potential performance benefit compared to the non-accelerated case.

    
        
            
        
    
    Buffer timer expired

Envoy CryptoMB Private Key Provider configuration
A regular TLS configuration only uses a private key. When a private key provider is used, the private key field is replaced with a private key provider field. It contains two fields, provider name and typed config. Typed config is CryptoMbPrivateKeyMethodConfig, and it specifies the private key and the poll delay.
TLS configuration with just a private key.
tls_certificates:
  certificate_chain: { "filename": "/path/cert.pem" }
  private_key: { "filename": "/path/key.pem" }
TLS configuration with CryptoMB private key provider.
tls_certificates:
  certificate_chain: { "filename": "/path/cert.pem" }
  private_key_provider:
    provider_name: cryptomb
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.private_key_providers.cryptomb.v3alpha.CryptoMbPrivateKeyMethodConfig
      private_key: { "filename": "/path/key.pem" }
      poll_delay: 10ms
Istio CryptoMB Private Key Provider configuration
In Istio, CryptoMB private key provider configuration can be applied mesh wide, gateways specific or pod specific configurations using pod annotations. The User will provide the PrivateKeyProvider in the ProxyConfig with the pollDelay value. This configuration will be applied to mesh wide (gateways and all sidecars).

    
        
            
        
    
    Sample mesh wide configuration

Istio Mesh wide Configuration
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: example-istiocontrolplane
spec:
  profile: demo
  components:
    egressGateways:
    - name: istio-egressgateway
      enabled: true
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
  meshConfig:
    defaultConfig:
      privateKeyProvider:
        cryptomb:
          pollDelay: 10ms
Istio Gateways Configuration
If a user wants to apply a private key provider configuration for ingress gateway only, follow the below sample configuration.
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: example-istiocontrolplane
spec:
  profile: demo
  components:
    egressGateways:
    - name: istio-egressgateway
      enabled: true
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        podAnnotations:
          proxy.istio.io/config: |
            privateKeyProvider:
              cryptomb:
                pollDelay: 10ms
Istio Sidecar Configuration using pod annotations
If a user wants to apply private key provider configuration to application specific pods, configure them using pod annotations like the below sample.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: httpbin
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  labels:
    app: httpbin
    service: httpbin
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
  selector:
    app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
      version: v1
  template:
    metadata:
      labels:
        app: httpbin
        version: v1
      annotations:
        proxy.istio.io/config: |
          privateKeyProvider:
            cryptomb:
              pollDelay: 10ms
    spec:
      serviceAccountName: httpbin
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
Performance
The potential performance benefit depends on many factors. For example, the size of the cpuset Envoy is running on, incoming traffic pattern, encryption type (RSA or ECDSA), and key size.
Below, we show performance based on the total latency between k6, gateway and Fortio server. These show relative performance improvement using the CryptoMB provider, and are in no way representative of Istio’s general performance or benchmark results.  Our measurements use different client tools (k6 and fortio), different setup (client, gateway and server running on separate nodes) and we create a new TLS handshake with every HTTP request.
We have published a white paper with general cryptographic performance numbers.

    
        
            
        
    
    Istio ingress gateway TLS handshake performance comparison. Tested using 1.14-dev on May 10th 2022

Configuration used in above comparison.

Azure AKS Kubernetes cluster

v1.21
Three-node cluster
Each node Standard_D4ds_v5: 3rd Generation Intel® Xeon® Platinum 8370C (Ice Lake), 4 vCPU, 16 GB memory


Istio

1.14-dev
Istio ingress gateway pod

resources.request.cpu: 2
resources.request.memory: 4 GB
resources.limits.cpu: 2
resources.limits.memory: 4 GB




K6

loadimpact/k6:latest


Fortio

fortio/fortio:1.27.0


K6 client, envoy and fortio pods are forced to run on separate nodes via Kubernetes AntiAffinity and node selectors
In above picture

Istio is installed with above configuration
Istio with CryptoMB (AVX-512) with above configuration + below settings



apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    ingressGateways:
    - enabled: true
      name: istio-ingressgateway
      k8s:
        # this controls the SDS service which configures ingress gateway
        podAnnotations:
          proxy.istio.io/config: |
            privateKeyProvider:
              cryptomb:
                pollDelay: 1ms
  values:
    # Annotate pods with
    #     inject.istio.io/templates: sidecar, cryptomb
    sidecarInjectorWebhook:
      templates:
        cryptomb: |
          spec:
            containers:
            - name: istio-proxy



Istio has applied to become a CNCF project
Mon, 25 Apr 2022 00:00:00 +0000
The Istio project is pleased to announce its intention to join the Cloud Native Computing Foundation (CNCF). With the support of the Istio Steering Committee, Google has submitted an application proposal for Istio to join the CNCF, the home of its companion projects Kubernetes and Envoy.
It is almost 5 years since Google, IBM and Lyft launched Istio 0.1 in May 2017. That first version set the standard for what a service mesh should be: traffic management, policy enforcement, and observability, powered by sidecars next to workloads. We’re proud to be the most popular service mesh according to a recent CNCF survey, and look forward to working closer with the CNCF communities around networking and service mesh.
As we deepen our integration with Kubernetes through the Gateway API and gRPC with proxyless mesh — not to mention Envoy, which has grown up beside Istio — we think it’s time to unite the premier Cloud Native stack under a single umbrella.
What’s next?
Today is just the start of a journey. The CNCF Technical Oversight Committee will carefully consider our application, and perform due diligence. After that, they’ll open up for a vote, and if successful, the project will be transferred.
The work we did in establishing guidelines for the Istio trademark through the Open Usage Commons (OUC) will ensure the whole ecosystem can continue to use the Istio trademarks in a free and fair fashion. The trademarks will move to the Linux Foundation but continue to be managed under OUC’s trademark guidelines.
Google currently funds and manages Istio’s build/test infrastructure. The company has committed to continue sponsoring this infrastructure as it moves to management by the CNCF, and it will be supported with credits from Google and other contributors after the transition is complete.
Nothing about our current open governance model has to change as a result of this transfer. We will continue to reward corporate contribution, community influence and long-term maintainership through our Steering Committee and Technical Oversight Committee model. Istio is key to the future of Google Cloud and Google intends to continue investing heavily in the project.
We want to thank the ecosystem of Istio users, integrated projects, and professional services vendors. Please send us a PR if you want to be listed on our site!
Istio is the building block for products by over 20 different vendors. No other service mesh has a comparable footprint.  We want to thank all the clouds, technology enterprises, startups and everyone else who has built a product based on Istio, or who makes Istio available with their hosted Kubernetes service.  We look forward to our continued collaboration.
Finally, we want to thank Google for their stewardship of the Istio community to date, their immeasurable contributions to Istio, and for their continued support during this transition.

    
        
            
        
    
    

See also
For more perspectives on today’s news, please read blog posts from Google, IBM, Tetrate, VMware, Solo.io, Aspen Mesh and Red Hat.



Configuring istioctl for a remote cluster
Fri, 25 Mar 2022 00:00:00 +0000
When using the istioctl CLI on a remote cluster of an
external control plane or a multicluster
Istio deployment, some of the commands will not work by default. For example, istioctl proxy-status requires access to
the istiod service to retrieve the status and configuration of the proxies it’s managing. If you try running it on a
remote cluster, you’ll get an error message like this:
$ istioctl proxy-status
Error: unable to find any Istiod instances
Notice that the error message doesn’t just say that it’s unable to access the istiod service, it specifically mentions
its inability to find istiod instances. This is because the istioctl proxy-status implementation needs to retrieve
the sync status of not just any single istiod instance, but rather all of them. When there is more than one istiod
instance (replica) running, each instance is only connected to a subset of the service proxies running in the mesh.
The istioctl command needs to return the status for the entire mesh, not just the subset managed by one of the instances.
In an ordinary Istio installation where the istiod service is running locally on the cluster
(i.e., a primary cluster), the command is implemented by simply finding all of the running
istiod pods, calling each one in turn, and then aggregating the result before returning it to the user.

    
        
            
        
    
    CLI with local access to istiod pods

When using a remote cluster, on the other hand, this is not possible since the istiod instances are running outside
of the mesh cluster and not accessible to the mesh user. The instances may not even be deployed using pods on a Kubernetes
cluster.
Fortunately, istioctl provides a configuration option to address this issue.
You can configure istioctl with the address of an external proxy service that will have access to the
istiod instances. Unlike an ordinary load-balancer service, which would delegate incoming requests to one of the
instances, this proxy service must instead delegate to all of the istiod instances, aggregate the responses,
and then return the combined result.
If the external proxy service is, in fact, running on another Kubernetes cluster, the proxy implementation code
can be very similar to the implementation code that istioctl runs in the primary cluster case, i.e., find all of the
running istiod pods, call each one in turn, and then aggregate the result.

    
        
            
        
    
    CLI without local access to istiod pods

An Istio Ecosystem project that includes an implementation of such an istioctl proxy server can be found
here. To try it out, you’ll need two clusters, one of which is
configured as a remote cluster using a control plane installed in the other cluster.
Install Istio with a remote cluster topology
To demonstrate istioctl working on a remote cluster, we’ll start by using the
external control plane install instructions
to set up a single remote cluster mesh with an external control plane running in a separate external cluster.
After completing the installation, we should have two environment variables, CTX_REMOTE_CLUSTER and CTX_EXTERNAL_CLUSTER,
containing the context names of the remote (mesh) and external (control plane) clusters, respectively.
We should also have the helloworld and sleep samples running in the mesh, i.e., on the remote cluster:
$ kubectl get pod -n sample --context="${CTX_REMOTE_CLUSTER}"
NAME                             READY   STATUS    RESTARTS   AGE
helloworld-v1-776f57d5f6-tmpkd   2/2     Running   0          10s
sleep-557747455f-v627d           2/2     Running   0          9s
Notice that if you try to run istioctl proxy-status in the remote cluster, you will see the error message
described earlier:
$ istioctl proxy-status --context="${CTX_REMOTE_CLUSTER}"
Error: unable to find any Istiod instances
Configure istioctl to use the sample proxy service
To configure istioctl, we first need to deploy  the proxy service next to the running istiod pods.
In our installation, we’ve deployed the control plane in the external-istiod namespace, so we start the proxy
service on the external cluster using the following command:
$ kubectl apply -n external-istiod --context="${CTX_EXTERNAL_CLUSTER}" \
    -f https://raw.githubusercontent.com/istio-ecosystem/istioctl-proxy-sample/main/istioctl-proxy.yaml
service/istioctl-proxy created
serviceaccount/istioctl-proxy created
secret/jwt-cert-key-secret created
deployment.apps/istioctl-proxy created
role.rbac.authorization.k8s.io/istioctl-proxy-role created
rolebinding.rbac.authorization.k8s.io/istioctl-proxy-role created
You can run the following command to confirm that the istioctl-proxy service is running next to istiod:
$ kubectl get po -n external-istiod --context="${CTX_EXTERNAL_CLUSTER}"
NAME                              READY   STATUS    RESTARTS   AGE
istioctl-proxy-664bcc596f-9q8px   1/1     Running   0          15s
istiod-666fb6694d-jklkt           1/1     Running   0          5m31s
The proxy service is a gRPC server that is serving on port 9090:
$ kubectl get svc istioctl-proxy -n external-istiod --context="${CTX_EXTERNAL_CLUSTER}"
NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
istioctl-proxy   ClusterIP   172.21.127.192           9090/TCP   11m
Before we can use it, however, we need to expose it outside of the external cluster.
There are many ways to do that, depending on the deployment environment. In our setup, we have an ingress gateway
running on the external cluster, so we could update it to also expose port 9090, update the associated virtual service
to direct port 9090 requests to the proxy service, and then configure istioctl to use the gateway address for the proxy
service. This would be a “proper” approach.
However, since this is just a simple demonstration where we have access to both clusters, we will simply port-forward
the proxy service to localhost:
$ kubectl port-forward -n external-istiod service/istioctl-proxy 9090:9090 --context="${CTX_EXTERNAL_CLUSTER}"
We now configure istioctl to use localhost:9090 to access the proxy by setting the ISTIOCTL_XDS_ADDRESS environment
variable:
$ export ISTIOCTL_XDS_ADDRESS=localhost:9090
$ export ISTIOCTL_ISTIONAMESPACE=external-istiod
$ export ISTIOCTL_PREFER_EXPERIMENTAL=true
Because our control plane is running in the external-istiod namespace, instead of the default istio-system, we also
need to set the ISTIOCTL_ISTIONAMESPACE environment variable.
Setting ISTIOCTL_PREFER_EXPERIMENTAL is optional. It instructs istioctl to redirect istioctl command calls to
an experimental equivalent, istioctl x command, for any command that has both a stable and experimental implementation.
In our case we need to use istioctl x proxy-status, the version that implements the proxy delegation feature.
Run the istioctl proxy-status command
Now that we’re finished configuring istioctl we can try it out by running the proxy-status command again:
$ istioctl proxy-status --context="${CTX_REMOTE_CLUSTER}"
NAME                                                      CDS        LDS        EDS        RDS        ISTIOD         VERSION
helloworld-v1-776f57d5f6-tmpkd.sample                     SYNCED     SYNCED     SYNCED     SYNCED          1.12.1
istio-ingressgateway-75bfd5668f-lggn4.external-istiod     SYNCED     SYNCED     SYNCED     SYNCED          1.12.1
sleep-557747455f-v627d.sample                             SYNCED     SYNCED     SYNCED     SYNCED          1.12.1
As you can see, this time it correctly displays the sync status of all the services running in the mesh. Notice that the
ISTIOD column returns the generic value , instead of the instance name (e.g., istiod-666fb6694d-jklkt)
that would be displayed if the pod was running locally. In this case, this detail is not available, or needed, by the
mesh user. It’s only available on the external cluster for the mesh operator to see.
Summary
In this article, we used a sample proxy server to configure istioctl to
work with an external control plane installation.
We’ve seen how some of the istioctl CLI commands don’t work out of the box on a remote cluster managed
by an external control plane. Commands such as istioctl proxy-status, among others, need access to the istiod service
instances managing the mesh, which are unavailable when the control plane is running outside of the mesh cluster.
To address this issue, istioctl was configured to delegate to a proxy server, running along side the external control
plane, which accesses the istiod instances on its behalf.



Register now for IstioCon 2022!
Mon, 21 Mar 2022 00:00:00 +0000
IstioCon is the annual user-centered event for Istio, the industry’s most popular service mesh. This event will take place April 25-29, it will be 100% virtual, and registrations are now open free of charge. If you are among the first 400 people to register to the conference, you are eligible to receive a conference t-shirt!

    
        
            
        
    
    

In 2021, more than 4,000 people from across 84 countries joined the event online, to hear from 27 end-user companies how they are using Istio in production. Participants were able to learn how Airbnb navigated scalability issues to finally find a solution in Istio, how HP set up a secure and wise platform with Istio, and how eBay used Istio to create federated access points, among many more examples of using Istio in production.
IstioCon 2022 will be an industry-focused event, a platform to connect contributors and users to discuss uses of Istio in different architectural setups, what are some limitations, and where to take the project next. The main focus will be in end-user companies, as we look forward to sharing a diversity of case studies showing how to use Istio in production. The content will be categorized according to expertise level.
This community-led event also has an interactive social hour to take the load off and mesh with the Istio community, vendors, and maintainers. Participation in the event is free of charge, register today for a chance to get the conference t-shirt!



Merbridge - Accelerate your mesh with eBPF
Mon, 07 Mar 2022 00:00:00 +0000
The secret of Istio’s abilities in traffic management, security, observability and policy is all in the Envoy proxy. Istio uses Envoy as the “sidecar” to intercept service traffic, with the kernel’s netfilter packet filter functionality configured by iptables.
There are shortcomings in using iptables to perform this interception. Since netfilter is a highly versatile tool for filtering packets, several routing rules and data filtering processes are applied before reaching the destination socket. For example, from the network layer to the transport layer, netfilter will be used for processing for several times with the rules predefined, like pre_routing, post_routing and etc. When the packet becomes a TCP packet or UDP packet, and is forwarded to user space, some additional steps like packet validation, protocol policy processing and destination socket searching will be performed. When a sidecar is configured to intercept traffic, the original data path can become very long, since duplicated steps are performed several times.
Over the past two years, eBPF has become a trending technology, and many projects based on eBPF have been released to the community. Tools like Cilium and Pixie show great use cases for eBPF in observability and network packet processing. With eBPF’s sockops and redir capabilities, data packets can be processed efficiently by directly being transported from an inbound socket to an outbound socket. In an Istio mesh, it is possible to use eBPF to replace iptables rules, and accelerate the data plane by shortening the data path.
We have created an open source project called Merbridge, and by applying the following command to your Istio-managed cluster, you can use eBPF to achieve such network acceleration.
$ kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml

    
        
            
        
        Attention: Merbridge uses eBPF functions which require a Linux kernel version ≥ 5.7.
    


With Merbridge, the packet datapath can be shortened directly from one socket to another destination socket, and here’s how it works.
Using eBPF sockops for performance optimization
Network connection is essentially socket communication. eBPF provides a function bpf_msg_redirect_hash, to directly forward the packets sent by the application in the inbound socket to the outbound socket. By entering the function mentioned before, developers can perform any logic to decide the packet destination. According to this characteristic, the datapath of packets can noticeably be optimized in the kernel.
The sock_map is the crucial piece in recording information for packet forwarding. When a packet arrives, an existing socket is selected from the sock_map to forward the packet to. As a result, we need to save all the socket information for packets to make the transportation process function properly. When there are new socket operations — like a new socket being created — the sock_ops function is executed.  The socket metadata is obtained and stored in the sock_map to be used when processing packets. The common key type in the sock_map is a “quadruple” of source and destination addresses and ports. With the key and the rules stored in the map, the destination socket will be found when a new packet arrives.
The Merbridge approach
Let’s introduce the detailed design and implementation principles of Merbridge step by step, with a real scenario.
Istio sidecar traffic interception based on iptables

    
        
            
        
    
    Istio Sidecar Traffic Interception Based on iptables

When external traffic hits your application’s ports, it will be intercepted by a PREROUTING rule in iptables, forwarded to port 15006 of the sidecar container, and handed over to Envoy for processing. This is shown as steps 1-4 in the red path in the above diagram.
Envoy processes the traffic using the policies issued by the Istio control plane. If allowed, the traffic will be sent to the actual container port of the application container.
When the application tries to access other services, it will be intercepted by an OUTPUT rule in iptables, and then be forwarded to port 15001 of the sidecar container, where Envoy is listening. This is steps 9-12 on the red path, similar to inbound traffic processing.
Traffic to the application port needs to be forwarded to the sidecar, then sent to the container port from the sidecar port, which is overhead. Moreover, iptables’ versatility determines that its performance is not always ideal because it inevitably adds delays to the whole datapath with different filtering rules applied. Although iptables is the common way to do packet filtering, in the Envoy proxy case, the longer datapath amplifies the bottleneck of packet filtering process in the kernel.
If we use sockops to directly connect the sidecar’s socket to the application’s socket, the traffic will not need to go through iptables rules, and thus performance can be improved.
Processing outbound traffic
As mentioned above, we would like to use eBPF’s sockops to bypass iptables to accelerate network requests. At the same time, we also do not want to modify any parts of Istio, to make Merbridge fully adaptive to the community version. As a result, we need to simulate what iptables does in eBPF.
Traffic redirection in iptables utilizes its DNAT function. When trying to simulate the capabilities of iptables using eBPF, there are two main things we need to do:

Modify the destination address, when the connection is initiated, so that traffic can be sent to the new interface.
Enable Envoy to identify the original destination address, to be able to identify the traffic.

For the first part, we can use eBPF’s connect program to process it, by modifying user_ip and user_port.
For the second part, we need to understand the concept of ORIGINAL_DST which belongs to the netfilter module in the kernel.
When an application (including Envoy) receives a connection, it will call the get_sockopt function to obtain ORIGINAL_DST. If going through the iptables DNAT process, iptables will set this parameter, with the “original IP + port” value, to the current socket. Thus, the application can get the original destination address according to the connection.
We have to modify this call process through eBPF’s get_sockopts function. (bpf_setsockopt is not used here because this parameter does not currently support the optname of SO_ORIGINAL_DST).
Referring to the figure below, when an application initiates a request, it will go through the following steps:

When the application initiates a connection, the connect program will modify the destination address to 127.x.y.z:15001, and use cookie_original_dst to save the original destination address.
In the sockops program, the current socket information and the quadruple are saved in sock_pair_map. At the same time, the same quadruple and its corresponding original destination address will be written to pair_original_dest. (Cookie is not used here because it cannot be obtained in the get_sockopt program).
After Envoy receives the connection, it will call the get_sockopt function to read the destination address of the current connection. get_sockopt will extract and return the original destination address from pair_original_dst, according to the quadruple information. Thus, the connection is completely established.
In the data transport step, the redir program will read the sock information from sock_pair_map according to the quadruple information, and then forward it directly through bpf_msg_redirect_hash to speed up the request.


    
        
            
        
    
    Processing Outbound Traffic

Why do we set the destination address to 127.x.y.z instead of 127.0.0.1?  When different pods exist, there might be conflicting quadruples, and this gracefully avoids conflict. (Pods’ IPs are different, and they will not be in the conflicting condition at any time.)
Inbound traffic processing
The processing of inbound traffic is basically similar to outbound traffic, with the only difference: revising the port of the destination to 15006.
It should be noted that since eBPF cannot take effect in a specified namespace like iptables, the change will be global, which means that if we use a Pod that is not originally managed by Istio, or an external IP address, serious problems will be encountered — like the connection not being established at all.
As a result, we designed a tiny control plane (deployed as a DaemonSet), which watches all pods — similar to the kubelet watching pods on the node — to write the pod IP addresses that have been injected into the sidecar to the local_pod_ips map.
When processing inbound traffic, if the destination address is not in the map, we will not do anything to the traffic.
Otherwise, the steps are the same as for outbound traffic.

    
        
            
        
    
    Processing Inbound Traffic

Same-node acceleration
Theoretically, acceleration between Envoy sidecars on the same node can be achieved directly through inbound traffic processing. However, Envoy will raise an error when accessing the application of the current pod in this scenario.
In Istio, Envoy accesses the application by using the current pod IP and port number. With the above scenario, we realized that the pod IP exists in the local_pod_ips map as well, and the traffic will be redirected to the pod IP on port 15006 again because it is the same address that the inbound traffic comes from. Redirecting to the same inbound address causes an infinite loop.
Here comes the question: are there any ways to get the IP address in the current namespace with eBPF? The answer is yes!
We have designed a feedback mechanism: When Envoy tries to establish the connection, we redirect it to port 15006. However, in the sockops step, we will determine if the source IP and the destination IP are the same. If yes, it means the wrong request is sent, and we will discard this connection in the sockops process. In the meantime, the current ProcessID and IP information will be written into the process_ip map, to allow eBPF to support correspondence between processes and IPs.
When the next request is sent, the same process need not be performed again. We will check directly from the process_ip map if the destination address is the same as the current IP address.

    
        
            
        
        Envoy will retry when the request fails, and this retry process will only occur once, meaning subsequent requests will be accelerated.
    



    
        
            
        
    
    Same-node acceleration

Connection relationship
Before applying eBPF using Merbridge, the data path between pods is like:

    
        
            
        
    
    iptables's data path

After applying Merbridge, the outbound traffic will skip many filter steps to improve the performance:

    
        
            
        
    
    eBPF's data path

If two pods are on the same machine, the connection can even be faster:

    
        
            
        
    
    eBPF's data path on the same machine

Performance results

    
        
            
        
        The below tests are from our development, and not yet validated in production use cases.
    


Let’s see the effect on overall latency using eBPF instead of iptables (lower is better):

    
        
            
        
    
    Latency vs Client Connections Graph

We can also see overall QPS after using eBPF (higher is better). Test results are generated with wrk.

    
        
            
        
    
    QPS vs Client Connections Graph

Summary
We have introduced the core ideas of Merbridge in this post. By replacing iptables with eBPF, the data transportation process can be accelerated in a mesh scenario. At the same time, Istio will not be changed at all. This means if you do not want to use eBPF any more, just delete the DaemonSet, and the datapath will be reverted to the traditional iptables-based routing without any problems.
Merbridge is a completely independent open source project. It is still at an early stage, and we are looking forward to having more users and developers to get engaged. It would be greatly appreciated if you would try this new technology to accelerate your mesh, and provide us with some feedback!
See also

Merbridge on GitHub
Using eBPF instead of iptables to optimize the performance of service grid data plane by Liu Xu, Tencent
Sidecar injection and transparent traffic hijacking process in Istio explained in detail by Jimmy Song, Tetrate
Accelerate the Istio data plane with eBPF by Yizhou Xu, Intel
Envoy’s Original Destination filter




Join us for IstioCon 2022!
Mon, 14 Feb 2022 00:00:00 +0000
IstioCon 2022, set for April 25-29, will be the second annual conference for Istio, the industry’s most popular service mesh. This year’s conference will again be 100% virtual, connecting community members across the globe with Istio’s ecosystem. Visit the conference website for all the information related to the event.

    
        
            
        
    
    

IstioCon provides an opportunity to showcase the lessons learned from running Istio in production, hands-on experiences from the Istio community, and will feature maintainers from across the Istio ecosystem. This year’s IstioCon features sessions focused on sharing real world examples, case studies, and success stories that can inspire newcomers to use Istio in production. The content will range from introductory to advanced levels, split into four main topic tracks:

Getting started & getting involved
Tools, features & functionality: Observability, traceability, and other things built on top of Istio.
Infrastructure & networking: How Istio works, with deep-dives into performance, cost, and multi-cloud environments.
Tech evolution & what’s next: The evolution of Istio, new standards, new extensions, and how to address problems that are interesting to tackle.

At this time, we encourage Istio users, developers, partners, and advocates to submit a session proposal through the conference’s CFP portal. The conference offers a mix of keynotes, technical talks, lightning talks, workshops, and roadmap sessions. Choose from the following formats to submit a session proposal for IstioCon:

Presentation: 40 minute presentation, maximum of 2 speakers
Panel: 40 minutes of discussion among 3 to 5 speakers
Workshop: 160 minute (2h 40m), in-depth, hands-on presentation with 1–4 speakers
Lighting Talk: 10 minute presentation, limited to 1 speaker

This community-led event also has an interactive social hour to take the load off and mesh with the Istio community, vendors, and maintainers. Participation in the event is free of charge, and will only require participants to register in order to attend.
Stay tuned to hear more about this conference, and we hope you can join us at IstioCon 2022!



An easier way to add virtual machines to Istio service mesh
Mon, 20 Dec 2021 00:00:00 +0000

Some of the complexities involved with joining a virtual machine are due to the sheer number of features that a service mesh provides. But, what if you only need a subset of those features? For example, secure communication from your virtual machine to services running inside your service mesh. With only a few tradeoffs, you can give your virtual machine service mesh features without all of the overhead.
What about local development? As more and more micro-services are deployed to Kubernetes and as your dependency graph resembles a spider web, it has become increasingly difficult to do local development. What if a local machine could simply join a service mesh and make calls to mesh applications. This solution could potentially save time and money by not requiring developers to wait for their code to be deployed.
Reduce Complexity

Today, adding a virtual machine to your Istio service mesh involves a lot of moving parts. You must create a Kubernetes service account, Istio workload entry and then generate configuration all before on-boarding a single virtual machine. There are also complexities to automating this, especially for auto scaling VMs. Finally you are required to expose Istiod externally to your cluster.
The complexity of adding a virtual machine comes from the expectation that the VM should participate 100% within the service mesh. For many this is not a necessity, by looking at the actual requirements of your system you may be able to simplify your virtual machine on-boarding and still get the features you need.
So what are some use cases that could be met but yet still make virtual machines easier to work with service mesh?
Single Direction Traffic Flow
Sometimes a virtual machine just needs to talk securely to applications within the service mesh. This is often the case when migrating VM based applications to Kubernetes in which other VMs may have depended on those applications. With the described approach below you can still achieve this without all of the operational overhead as shown above.

Developer Access to Service Mesh
Engineers often do not have the resources to run all of the required micro-services for their environment. The approach below explains how you can achieve this in the same way that virtual machines securely communicate with mesh applications.

Decouple Envoy and Istio
The largest amount of complexity as it relates to virtual machines is the connecting of envoy to istiod to get its configuration. A simpler approach is to just not connect them anymore. Even though Istio will no longer know about the virtual machines that are communicating in the mesh, that communication can still be secure and authenticated. The trick is issuing virtual machines their own workload certificates that are rooted in the same trust chain as the mesh workloads. This also means that the end user will be responsible for configuring envoy manually on the virtual machine.  For most this shouldn’t be an issue because it is not expected that it will change very often.
A Simpler On-boarding Experience

We can achieve a simpler setup by utilizing some built-in Istio features. First we need to expose a secure tunnel for applications outside the mesh to communicate with applications within.
To do this we simply need to create an Istio east-west gateway and enable AUTO_PASSTHROUGH. This automatically configures the east-west gateway to pass traffic through to the correct service over mTLS. This gives your virtual machine end to end authenticated encryption with the application its trying to reach.

Due to the complexity involved in configuring envoy to talk to istiod, it is more practical to directly configure the virtual machine envoy. At first this sounds quite daunting, but due to the reduced complexity we only need to enable a few features to make this work. Envoy will need to be configured to know about each service mesh application that the virtual machine will need to communicate with. We then will configure these as clusters within envoy and set them up to use mTLS communication passing through the east-west gateway in the service mesh. Secondly a listener will need to be exposed to handle incoming traffic from the virtual machine application. Finally certificates will need to be issued for each virtual machine that share the same root of trust as the service mesh applications. This allows end to end encryption as well as the ability to authorize which applications the virtual machine can communicate with.
Easier to Automate
Given that no initialization has to occur on the service mesh cluster when on-boarding a virtual machine, it is much easier to automate. The configuration needed for the virtual machine envoy can be added to your pipeline; the envoy container can either be pulled via docker, or added to your image building infrastructure; the mTLS certificates can also be provisioned and maintained by a third party such as Hashicorp’s Vault.
More Runtime Support
Due to the fact that this installation method does not require access to the underlying OS networking. You can run this approach in more types of environments including Windows and Docker. The only requirement is that your Envoy include the Istio extensions found here. Using Docker, you can now run the Envoy proxy on your local machine and communicate with the service mesh directly.

Advanced Use Cases
gRPC to JSON
This technique can also be leveraged to enable virtual machine applications to communicate with gRPC applications without having to implement the gRPC endpoints. Using envoys gRPC / JSON transformation, the virtual machine application can communicate with its local envoy over REST and envoy will translate that to gRPC.

Multi Direction
Even though your service mesh may not know about the virtual machines that are communicating with it, you can still add them as external endpoints using Service Entries. That service entry could be an HTTPS  Load Balancer endpoint that manages traffic to multiple virtual machines. This setup is still often more feasible than fully on-boarding virtual machines into the virtual mesh.

Forwarding Proxy
Maybe installing envoy on every virtual machine is still too complex. An alternative is to run envoy (or an autoscaling group) to run on its own virtual machine and act as a forwarding proxy into the mesh. This is a much simpler solution to accessing mesh services as the virtual machines that run the applications are left untouched.

Part 2…
In part 2, I will explain how to configure Istio as well as a virtual machine to communicate within the mesh. If you would like a preview, feel free to reach out to nick.nellis@solo.io
Special Thanks
A special thanks to Dave Ortiz for this virtual machine idea and congrats to Constant Contact a new registered Istio user!



Announcing the alpha availability of WebAssembly Plugins
Thu, 16 Dec 2021 00:00:00 +0000
Istio 1.9 introduced experimental support for WebAssembly (Wasm) module distribution and a Wasm extensions ecosystem repository with canonical examples and use cases for extension development. Over the past 9 months, the Istio, Envoy, and Proxy-Wasm communities have continued our joint effort to make Wasm extensibility stable, reliable, and easy to adopt, and we are pleased to announce Alpha support for Wasm extensibility in Istio 1.12! In the following sections, we’ll walk through the updates that have been made to the Wasm support for the 1.12 release.
New WasmPlugin API
With the new WasmPlugin CRD in the extensions.istio.io namespace, we’re introducing a new high-level API for extending the functionality of the Istio proxy with custom Wasm modules. This effort builds on the excellent work that has gone into the Proxy-Wasm specification and implementation over the last two years. From now on, you no longer need to use EnvoyFilter resources to add custom Wasm modules to your proxies. Instead, you can now use a WasmPlugin resource:
apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
  name: your-filter
spec:
  selector:
    matchLabels:
      app: server
  phase: AUTHN
  priority: 10
  pluginConfig:
    someSetting: true
    someOtherSetting: false
    youNameIt:
    - first
    - second
  url: docker.io/your-org/your-filter:1.0.0
There are a lot of similarities and a few differences between WasmPlugin and EnvoyFilter, so let’s go through the fields one by one.
The above example deploys a Wasm module to all workloads (including gateway pods) that match the selector field - this very much works the same as in an EnvoyFilter.
The next field below that is the phase. This determines where in the proxy’s filter chain the Wasm module will be injected. We have defined four distinct phases for injection:

AUTHN: prior to any Istio authentication and authorization filters.
AUTHZ: after the Istio authentication filters and before any first-class authorization filters, i.e., before AuthorizationPolicy resources have been applied.
STATS: after all authorization filters and prior to the Istio stats filter.
UNSPECIFIED_PHASE: let the control plane decide where to insert. This will generally be at the end of the filter chain, right before the router. This is the default value for this phase field.

The pluginConfig field is used for configuring your Wasm plugin. Whatever you put into this field will be encoded in JSON and passed on to your filter, where you can access it in the configuration callback of the Proxy-Wasm SDKs. For example, you can retrieve the config with onConfigure in the C++ SDK, on_configure in the Rust SDK or the OnPluginStart call back in the Go SDK.
The url field specifies where to pull the Wasm module. You’ll notice that the url in this example is a docker URI. Apart from loading Wasm modules via HTTP, HTTPS and the local file system (using file://), we are introducing the OCI image format as the preferred mechanism for distributing Wasm modules.
One last thing to note is currently the Wasm Plugin API only applies to inbound HTTP filter chains.
Support for network filters and outbound traffic will be added in the future.
Wasm image specification
We believe that containers are the ideal way to store, publish and manage proxy extensions, so we worked with Solo.io to extend their existing Proxy-Wasm container format with a variant that aims to be compatible with all registries and the CLI toolchain. Depending on your processes, you can now build your proxy extension containers using your existing container CLI tooling such as Docker CLI or buildah.
To learn how to build OCI images, please refer to these instructions.
Image fetcher in Istio agent
Since Istio 1.9, Istio-agent has provided a reliable solution for loading Wasm binaries, fetched from remote HTTP sources configured in the EnvoyFilters, by leveraging the xDS proxy inside istio-agent and Envoy’s Extension Configuration Discovery Service (ECDS). The same mechanism applies for the new Wasm API implementation in Istio 1.12. You can use HTTP remote resources reliably without concern that Envoy might get stuck with a bad configuration when a remote fetch fails.
In addition, Istio 1.12 expands this capability to Wasm OCI images. This means the Istio-agent is now able to fetch Wasm images from any OCI registry including Docker Hub, Google Container Registry(GCR), Amazon Elastic Container Registry (Amazon ECR), etc. After fetching images, Istio-agent extracts and caches Wasm binaries from them, and then inserts them into the Envoy filter chains.

    
        
            
        
    
    Remote Wasm module fetch flow

Improvements in Envoy Wasm runtime
The Wasm runtime powered by V8 in Envoy has been shipped since Istio 1.5 and there have been a lot of improvements since then.
WASI supports
First, some of the WASI (WebAssembly System Interface) system calls are now supported. For example, the clock_time_get system call can be made from Wasm programs so you can use std::time::SystemTime::now() in Rust or time.Now().UnixNano() in Go in your Envoy Wasm extensions, just like any other native platform. Another example is random_get is now supported by Envoy, so the “crypto/rand” package is available in Go as a cryptographically secure random number generator. We are also currently looking into file system support as we have seen requests for reading and writing local files from Wasm programs running in Envoy.
Debuggability
Next is the improvement in debuggability. The Envoy runtime now emits the stack trace of your program when it causes runtime errors, for example, when null pointer exceptions occur in C++ or the panic function is called in Go or Rust. While Envoy error messages did not previously include anything about the cause, they now show the trace which you can use to debug your program:
Function: proxy_on_request_headers failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
  0:  0xdbd - runtime._panic
  1:  0x103ab - main.anotherCalculation
  2:  0x10399 - main.someCalculation
  3:  0xea57 - main.myHeaderHandler
  4:  0xea15 - proxy_on_request_headers
The above is an example stack trace from a Go SDK based Wasm extension. You might notice that the output does not include file names and line numbers in the trace. This is an important future work item and open issue related to the DWARF format for WebAssembly and the Exception Handling proposal for the WebAssembly specification.
Strace support for Wasm programs
You can see strace equivalent logs emitted by Envoy. With Istio proxy’s component log level wasm:trace, you can observe all the system calls and Proxy-Wasm ABI calls that go across the boundary between Wasm virtual machines and Envoy. The following is an example of such an strace log stream:
[host->vm] proxy_on_context_create(2, 1)
[host<-vm] proxy_on_context_create return: void
[host->vm] proxy_on_request_headers(2, 8, 1)
[vm->host] wasi_snapshot_preview1.random_get(86928, 32)
[vm<-host] wasi_snapshot_preview1.random_get return: 0
[vm->host] env.proxy_log(2, 87776, 18)
This is especially useful to debug a Wasm program’s execution at runtime, for example, to verify it is not making any malicious system calls.
Arbitrary Prometheus namespace for in-Wasm metrics
The last update is about metrics. Wasm extensions have been able to define their own custom metrics and expose them in Envoy, just like any other metric, but prior to Istio 1.12, all of these custom metrics were prefixed with the envoy_ Prometheus namespace and users were not able to use their own namespaces. Now, you can choose whatever namespace you want, and your metrics will be exposed in Envoy as-is, without being prefixed by envoy_.
Note that in order to actually expose these custom metrics, you have to configure ProxyConfig.proxyStatsMatcher in meshConfig for global configuration or in proxy.istio.io/config for per proxy configuration. For detail, please refer to Envoy Statistics.
Future work and looking for feedback
Although we have announced the alpha availability of Wasm plugins, there is still a lot of work left to be done. One important work item is “Image pull secrets” support in the Wasm API which will allow you to easily consume OCI images in a private repository. Others include first-class support for L4 filters, signature verification of Wasm binaries, runtime improvements in Envoy, Proxy-Wasm SDK improvements, documentation, etc.
This is just the beginning of our plan to provide 1st-class Wasm support in Istio. We would love to hear your feedback so that we can improve the developer experience using Wasm plugins, in future releases of Istio!



gRPC Proxyless Service Mesh
Thu, 28 Oct 2021 00:00:00 +0000
Istio dynamically configures its Envoy sidecar proxies using a set of discovery APIs, collectively known as the
xDS APIs.
These APIs aim to become a universal data-plane API.
The gRPC project has significant support for the xDS APIs, which means you can manage gRPC workloads
without having to deploy an Envoy sidecar along with them. You can learn more about the integration in a
KubeCon EU 2021 talk from Megan Yahya. The latest updates on gRPC’s
support can be found in their proposals along with implementation
status.
Istio 1.11 adds experimental support for adding gRPC services directly to the mesh. We support basic service
discovery, some VirtualService based traffic policy, and mutual TLS.
Supported Features
The current implementation of the xDS APIs within gRPC is limited in some areas compared to Envoy. The following
features should work, although this is not an exhaustive list and other features may have partial functionality:

Basic service discovery. Your gRPC service can reach other pods and virtual machines registered in the mesh.
DestinationRule:

Subsets: Your gRPC service can split traffic based on label selectors to different groups of instances.
The only Istio loadBalancer currently supported is ROUND_ROBIN, consistentHash will be added in
future versions of Istio (it is supported by gRPC).
tls settings are restricted to DISABLE or ISTIO_MUTUAL. Other modes will be treated as DISABLE.


VirtualService:

Header match and URI match in the format /ServiceName/RPCName.
Override destination host and subset.
Weighted traffic shifting.


PeerAuthentication:

Only DISABLE and STRICT are supported. Other modes will be treated as DISABLE.
Support for auto-mTLS may exist in a future release.



Other features including faults, retries, timeouts, mirroring and rewrite rules may be supported in a future release.
Some of these features are awaiting implementation in gRPC, and others require work in Istio to support. The status
of xDS features in gRPC can be found here. The
status of Istio’s support will exist in future official docs.

    
        
            
        
        This is feature is experimental. Standard Istio features will become supported
over time along with improvements to the overall design.
    


Architecture Overview

    
        
            
        
    
    Diagram of how gRPC services communicate with the istiod

Although this doesn’t use a proxy for data plane communication, it still requires an agent for initialization and
communication with the control-plane. First, the agent generates a bootstrap file
at startup the same way it would generate bootstrap for Envoy. This tells the gRPC library how to connect to istiod,
where it can find certificates for data plane communication, and what metadata to send to the control plane. Next, the
agent acts as an xDS proxy, connecting and authenticating with istiod on the application’s behalf. Finally, the
agent fetches and rotates certificates used in data plane traffic.
Changes to application code

    
        
            
        
        This section covers gRPC’s XDS support in Go. Similar APIs exist for other languages.
    


To enable the xDS features in gRPC, there are a handful of required changes your application must make. Your gRPC version should be at least 1.39.0.
In the client
The following side-effect import will register the xDS resolvers and balancers within gRPC. It should be added in your
main package or in the same package calling grpc.Dial.
import _ "google.golang.org/grpc/xds"
When creating a gRPC connection the URL must use the xds:/// scheme.
conn, err := grpc.DialContext(ctx, "xds:///foo.ns.svc.cluster.local:7070")
Additionally, for (m)TLS support, a special TransportCredentials option has to be passed to DialContext.
The FallbackCreds allow us to succeed when istiod doesn’t send security config.
import "google.golang.org/grpc/credentials/xds"

...

creds, err := xds.NewClientCredentials(xds.ClientOptions{
FallbackCreds: insecure.NewCredentials()
})
// handle err
conn, err := grpc.DialContext(
ctx,
"xds:///foo.ns.svc.cluster.local:7070",
grpc.WithTransportCredentials(creds),
)
On the server
To support server-side configurations, such as mTLS, there are a couple of modifications that must be made.
First, we use a special constructor to create the GRPCServer:
import "google.golang.org/grpc/xds"

...

server = xds.NewGRPCServer()
RegisterFooServer(server, &fooServerImpl)
If your protoc generated Go code is out of date, you may need to regenerate it to be compatible with the xDS server.
Your generated RegisterFooServer function should look like the following:
func RegisterFooServer(s grpc.ServiceRegistrar, srv FooServer) {
s.RegisterService(&FooServer_ServiceDesc, srv)
}
Finally, as with the client-side changes, we must enable security support:
creds, err := xds.NewServerCredentials(xdscreds.ServerOptions{FallbackCreds: insecure.NewCredentials()})
// handle err
server = xds.NewGRPCServer(grpc.Creds(creds))
In your Kubernetes Deployment
Assuming your application code is compatible, the Pod simply needs the annotation inject.istio.io/templates: grpc-agent.
This adds a sidecar container running the agent described above, and some environment variables that gRPC uses to find
the bootstrap file and enable certain features.
For gRPC servers, your Pod should also be annotated with proxy.istio.io/config: '{"holdApplicationUntilProxyStarts": true}'
to make sure the in-agent xDS proxy and bootstrap file are ready before your gRPC server is initialized.
Example
In this guide you will deploy echo, an application that already supports both server-side and client-side
proxyless gRPC. With this app you can try out some supported traffic policies enabling mTLS.
Prerequisites
This guide requires the Istio (1.11+) control plane to be installed before proceeding.
Deploy the application
Create an injection-enabled namespace echo-grpc. Next deploy two instances of the echo app as well as the Service.
$ kubectl create namespace echo-grpc
$ kubectl label namespace echo-grpc istio-injection=enabled
$ kubectl -n echo-grpc apply -f samples/grpc-echo/grpc-echo.yaml
Make sure the two pods are running:
$ kubectl -n echo-grpc get pods
NAME                       READY   STATUS    RESTARTS   AGE
echo-v1-69d6d96cb7-gpcpd   2/2     Running   0          58s
echo-v2-5c6cbf6dc7-dfhcb   2/2     Running   0          58s
Test the gRPC resolver
First, port-forward 17171 to one of the Pods. This port is a non-xDS backed gRPC server that allows making
requests from the port-forwarded Pod.
$ kubectl -n echo-grpc port-forward $(kubectl -n echo-grpc get pods -l version=v1 -ojsonpath='{.items[0].metadata.name}') 17171 &
Next, we can fire off a batch of 5 requests:
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc.svc.cluster.local:7070", "count": 5}' :17171 proto.EchoTestService/ForwardEcho | jq -r '.output | join("")'  | grep Hostname
Handling connection for 17171
[0 body] Hostname=echo-v1-7cf5b76586-bgn6t
[1 body] Hostname=echo-v2-cf97bd94d-qf628
[2 body] Hostname=echo-v1-7cf5b76586-bgn6t
[3 body] Hostname=echo-v2-cf97bd94d-qf628
[4 body] Hostname=echo-v1-7cf5b76586-bgn6t
You can also use Kubernetes-like name resolution for short names:
$ grpcurl -plaintext -d '{"url": "xds:///echo:7070"}' :17171 proto.EchoTestService/ForwardEcho | jq -r '.output | join
("")'  | grep Hostname
[0 body] Hostname=echo-v1-7cf5b76586-ltr8q
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc:7070"}' :17171 proto.EchoTestService/ForwardEcho | jq -r
'.output | join("")'  | grep Hostname
[0 body] Hostname=echo-v1-7cf5b76586-ltr8q
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc.svc:7070"}' :17171 proto.EchoTestService/ForwardEcho | jq -r
'.output | join("")'  | grep Hostname
[0 body] Hostname=echo-v2-cf97bd94d-jt5mf
Creating subsets with destination rule
First, create a subset for each version of the workload.
$ cat <

Traffic shifting
Using the subsets defined above, you can send 80 percent of the traffic to a specific version:
$ cat <

Now, send a set of 10 requests:
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc.svc.cluster.local:7070", "count": 10}' :17171 proto.EchoTestService/ForwardEcho | jq -r '.output | join("")'  | grep ServiceVersion
The response should contain mostly v2 responses:
[0 body] ServiceVersion=v2
[1 body] ServiceVersion=v2
[2 body] ServiceVersion=v1
[3 body] ServiceVersion=v2
[4 body] ServiceVersion=v1
[5 body] ServiceVersion=v2
[6 body] ServiceVersion=v2
[7 body] ServiceVersion=v2
[8 body] ServiceVersion=v2
[9 body] ServiceVersion=v2
Enabling mTLS
Due to the changes to the application itself required to enable security in gRPC, Istio’s traditional method of
automatically detecting mTLS support is unreliable. For this reason, the initial release requires explicitly enabling
mTLS on both the client and server.
To enable client-side mTLS, apply a DestinationRule with tls settings:
$ cat <

Now an attempt to call the server that is not yet configured for mTLS will fail.
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc.svc.cluster.local:7070"}' :17171 proto.EchoTestService/ForwardEcho | jq -r '.output | join("")'
Handling connection for 17171
ERROR:
Code: Unknown
Message: 1/1 requests had errors; first error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
To enable server-side mTLS, apply a PeerAuthentication.

    
        
            
        
        The following policy forces STRICT mTLS for the entire namespace.
    


$ cat <

Requests will start to succeed after applying the policy.
$ grpcurl -plaintext -d '{"url": "xds:///echo.echo-grpc.svc.cluster.local:7070"}' :17171 proto.EchoTestService/ForwardEcho | jq -r '.output | join("")'
Handling connection for 17171
[0] grpcecho.Echo(&{xds:///echo.echo-grpc.svc.cluster.local:7070 map[] 0  5s false })
[0 body] x-request-id=0
[0 body] Host=echo.echo-grpc.svc.cluster.local:7070
[0 body] content-type=application/grpc
[0 body] user-agent=grpc-go/1.39.1
[0 body] StatusCode=200
[0 body] ServiceVersion=v1
[0 body] ServicePort=17070
[0 body] Cluster=
[0 body] IP=10.68.1.18
[0 body] IstioVersion=
[0 body] Echo=
[0 body] Hostname=echo-v1-7cf5b76586-z5p8l
Limitations
The initial release comes with several limitations that may be fixed in a future version:

Auto-mTLS isn’t supported, and permissive mode isn’t supported. Instead we require explicit mTLS configuration with
STRICT on the server and ISTIO_MUTUAL on the client. Envoy can be used during the migration to STRICT.
grpc.Serve(listener) or grpc.Dial("xds:///...") called before the bootstrap is written or xDS proxy is ready can
cause a failure. holdApplicationUntilProxyStarts can be used to work around this, or the application can be more
robust to these failures.
If the xDS-enabled gRPC server uses mTLS then you will need to make sure your health checks can work around this.
Either a separate port should be used, or your health-checking client needs a way to get the proper client
certificates.
The implementation of xDS in gRPC does not match Envoys. Certain behaviors may be different, and some features may
be missing. The feature status for gRPC provides more detail. Make sure to test that any Istio
configuration actually applies on your proxyless gRPC apps.

Performance
Experiment Setup

Using Fortio, a Go-based load testing app

Slightly modified, to support gRPC’s XDS features (PR)


Resources:

GKE 1.20 cluster with 3 e2-standard-16 nodes (16 CPUs + 64 GB memory each)
Fortio client and server apps: 1.5 vCPU, 1000 MiB memory
Sidecar (istio-agent and possibly Envoy proxy): 1 vCPU, 512 MiB memory


Workload types tested:

Baseline: regular gRPC with no Envoy proxy or Proxyless xDS in use
Envoy: standard istio-agent + Envoy proxy sidecar
Proxyless: gRPC using the xDS gRPC server implementation and xds:/// resolver on the client
mTLS enabled/disabled via PeerAuthentication and DestinationRule



Latency

    
        
            
        
    
    p50 latency comparison chart


    
        
            
        
    
    p99 latency comparison chart

There is a marginal increase in latency when using the proxyless gRPC resolvers. Compared to Envoy this is a massive
improvement that still allows for advanced traffic management features and mTLS.
istio-proxy container resource usage

  
      
          
          Client mCPU
          Client Memory (MiB)
          Server mCPU
          Server Memory (MiB)
      
  
  
      
          Envoy Plaintext
          320.44
          66.93
          243.78
          64.91
      
      
          Envoy mTLS
          340.87
          66.76
          309.82
          64.82
      
      
          Proxyless Plaintext
          0.72
          23.54
          0.84
          24.31
      
      
          Proxyless mTLS
          0.73
          25.05
          0.78
          25.43
      
  

Even though we still require an agent, the agent uses less than 0.1% of a full vCPU, and only 25 MiB of memory,
which is less than half of what running Envoy requires.
These metrics don’t include additional resource usage by gRPC in the application container,
but serve to demonstrate the resource usage impact of the istio-agent when running in this mode.



Aeraki — Manage Any Layer-7 Protocol in Istio Service Mesh
Tue, 28 Sep 2021 00:00:00 +0000
Aeraki [Air-rah-ki] is the Greek word for ‘breeze’. While Istio connects microservices in a service mesh, Aeraki provides a framework to allow Istio to support more layer-7 protocols other than just HTTP and gRPC. We hope this breeze can help Istio sail a little further.
Lack of Protocols Support in Service Mesh
We are now facing some challenges with service meshes:

Istio and other popular service mesh implementations have very limited support for layer 7 protocols other than HTTP and gRPC.
Envoy RDS(Route Discovery Service) is solely designed for HTTP. Other protocols such as Dubbo and Thrift can only use listener in-line routes for traffic management, which breaks existing connections when routes change.
It takes a lot of effort to introduce a proprietary protocol into a service mesh. You’ll need to write an Envoy filter to handle the traffic in the data plane, and a control plane to manage those Envoys.

Those obstacles make it very hard, if not impossible, for users to manage the traffic of other widely-used layer-7 protocols in microservices. For example, in a microservices application, we may have the below protocols:

RPC: HTTP, gRPC, Thrift, Dubbo, Proprietary RPC Protocol …
Messaging: Kafka, RabbitMQ …
Cache: Redis, Memcached …
Database: MySQL, PostgreSQL, MongoDB …


    
        
            
        
    
    Common Layer-7 Protocols Used in Microservices

If you have already invested a lot of effort in migrating to a service mesh, of course, you want to get the most out of it — managing the traffic of all the protocols in your microservices.
Aeraki’s approach
To address these problems, we create an open-source project, Aeraki Mesh, to provide a non-intrusive, extendable way to manage any layer-7 traffic in an Istio service mesh.

    
        
            
        
    
    Aeraki Architecture

As this diagram shows, Aeraki Framework consists of the following components:

Aeraki: Aeraki provides high-level, user-friendly traffic management rules to operations, translates the rules to envoy filter configurations, and leverages Istio’s EnvoyFilter API to push the configurations to the sidecar proxies. Aeraki also serves as the RDS server for MetaProtocol proxies in the data plane. Contrary to Envoy RDS, which focuses on HTTP, Aeraki RDS is aimed to provide a general dynamic route capability for all layer-7 protocols.
MetaProtocol Proxy: MetaProtocol Proxy provides common capabilities for Layer-7 protocols, such as load balancing, circuit breaker, load balancing, routing, rate limiting, fault injection, and auth. Layer-7 protocols can be built on top of MetaProtocol. To add a new protocol into the service mesh, the only thing you need to do is implementing the codec interface and a couple of lines of configuration. If you have special requirements which can’t be accommodated by the built-in capabilities, MetaProtocol Proxy also has an application-level filter chain mechanism, allowing users to write their own layer-7 filters to add custom logic into MetaProtocol Proxy.

Dubbo and Thrift have already been implemented based on MetaProtocol. More protocols are on the way. If you’re using a close-source, proprietary protocol, you can also manage it in your service mesh simply by writing a MetaProtocol codec for it.
Most request/response style, stateless protocols can be built on top of the MetaProtocol Proxy. However, some protocols’ routing policies are too “special” to be normalized in MetaProtocol. For example, Redis proxy uses a slot number to map a client query to a specific Redis server node, and the slot number is computed by the key in the request. Aeraki can still manage those protocols as long as there’s an available Envoy Filter in the Envoy proxy side. Currently, for protocols in this category, Redis and Kafka are supported in Aeraki.
Deep Dive Into MetaProtocol
Let’s look into how MetaProtocol works. Before MetaProtocol is introduced, if we want to proxy traffic for a specific protocol, we need to write an Envoy filter that understands that protocol and add the code to manipulate the traffic, including routing, header modification, fault injection, traffic mirroring, etc.
For most request/response style protocols, the code for traffic manipulation is very similar. Therefore, to avoid duplicating these functionalities in different Envoy filters, Aeraki Framework implements most of the common functions of a layer-7 protocol proxy in a single place — the MetaProtocol Proxy filter.

    
        
            
        
    
    MetaProtocol Proxy

This approach significantly lowers the barrier to write a new Envoy filter: instead of writing a fully functional filter, now you only need to implement the codec interface. In addition to that, the control plane is already in place — Aeraki works at the control plane to provides MetaProtocol configuration and dynamic routes for all protocols built on top of MetaProtocol.

    
        
            
        
    
    Writing an Envoy Filter Before and After MetProtocol

There are two important data structures in MetaProtocol Proxy: Metadata and Mutation. Metadata is used for routing, and Mutation is used for header manipulation.
At the request path, the decoder(the decode method of the codec implementation) populates the Metadata data structure with key-value pairs parsed from the request, then the Metadata will be passed to the MetaProtocol Router. The Router selects an appropriate upstream cluster after matching the route configuration it receives from Aeraki via RDS and the Metadata.
A custom filter can populate the Mutation data structure with arbitrary key-value pairs if the request needs to be modified: adding a header or changing the value of a header. Then the Mutation data structure will be passed to the encoder(the encode method of the codec implementation). The encoder is responsible for writing the key-value pairs into the wire protocol.

    
        
            
        
    
    The Request Path

The response path is similar to the request path, only in a different direction.

    
        
            
        
    
    The Response Path

An Example
If you need to implement an application protocol based on MetaProtocol, you can follow the below steps(use Thrift as an example):
Data Plane


Implement the codec interface to encode and decode the protocol package. You can refer to Dubbo codec and Thrift codec as writing your own implementation.


Define the protocol with Aeraki ApplicationProtocol CRD, as this YAML snippet shows:


apiVersion: metaprotocol.aeraki.io/v1alpha1
kind: ApplicationProtocol
metadata:
  name: thrift
  namespace: istio-system
spec:
  protocol: thrift
  codec: aeraki.meta_protocol.codec.thrift
Control Plane
You don’t need to implement the control plane. Aeraki watches services and traffic rules, generates the configurations for the sidecar proxies, and sends the configurations to the data plane via EnvoyFilter and MetaProtocol RDS.
Protocol selection
Similar to Istio, protocols are identified by service port prefix. Please name service ports with this pattern: tcp-metaprotocol-{application protocol}-xxx. For example, a Thrift service port should be named tcp-metaprotocol-thrift.
Traffic management
You can change the route via MetaRouter CRD. For example: send 20% of the requests to v1 and 80% to v2:
apiVersion: metaprotocol.aeraki.io/v1alpha1
kind: MetaRouter
metadata:
  name: test-metaprotocol-route
spec:
  hosts:
    - thrift-sample-server.thrift.svc.cluster.local
  routes:
    - name: traffic-spilt
      route:
        - destination:
            host: thrift-sample-server.thrift.svc.cluster.local
            subset: v1
          weight: 20
        - destination:
            host: thrift-sample-server.thrift.svc.cluster.local
            subset: v2
          weight: 80
Hope this helps if you need to manage protocols other than HTTP in a service mesh. Reach out to zhaohuabing if you have any questions.
Reference

Aeraki Mesh website
Aeraki Mesh on GitHub
Live Demo: Kiali Dashboard
Live Demo: Service Metrics: Grafana
Live Demo: Service Metrics: Prometheus
Istio meetup China(Chinese): Full Stack Service Mesh - Manage Any Layer-7 Traffic in an Istio Service Mesh with Aeraki
IstioCon 2021: How to Manage Any Layer-7 Traffic in an Istio Service Mesh?




Announcing Extended Support for Istio 1.9
Fri, 03 Sep 2021 00:00:00 +0000
In keeping with our 2021 theme of improving Day 2 Istio operations, the Istio team has been evaluating extending the support window for our releases to give users more time to upgrade.  For starters, we are extending the support window of Istio 1.9 by six weeks, to October 5, 2021.  We hope that this additional support window will allow the many users who are currently using Istio 1.9 to upgrade, either to Istio 1.10 or directly to Istio 1.11. By overlapping support between 1.9 and 1.11, we intend to create a stable cadence of upgrade windows twice a year for users upgrading directly across two minor versions (i.e. 1.9 to 1.11).  Users who prefer upgrading through each minor release to get all the latest and greatest features may continue doing so quarterly.

    
        
            
        
    
    Extended Support and Upgrades

During this extended period of support, Istio 1.9 will receive CVE and critical bug fixes only, as our goal is simply to provide users with time to migrate off the release and on to 1.10 or 1.11.   And speaking of users, we would love to hear how we’re doing at improving your Day 2 experience of Istio.  Is two upgrades per year not the right number?  Is a six week upgrade window too short?  Please share your thoughts with us on slack (in the user-experience channel), or on twitter.  Thanks!



Announcing the results of Istio’s first security assessment
Tue, 13 Jul 2021 00:00:00 +0000
The Istio service mesh has gained wide production adoption across a wide variety of
industries. The success of the project, and its critical usage for enforcing key
security policies in infrastructure warranted an open and neutral assessment of
the security risks associated with the project.
To achieve this goal, the Istio community contracted the
NCC Group last year to
conduct a third-party security assessment of the project. The goal of the review
was “to identify security issues related to the Istio code base, highlight
high-risk configurations commonly used by administrators, and provide
perspective on whether security features sufficiently address the concerns they
are designed to provide”.
NCC Group carried out the review over a period of five weeks with collaboration
from subject matter experts across the Istio community. In this blog, we will
examine the key findings of the report, actions taken to implement various fixes
and recommendations, and our plan of action for continuous security evaluation
and improvement of the Istio project. You can download and read the
unabridged version of the
security assessment report.
Scope and Key Findings
The assessment evaluated Istio’s architecture as a whole for security related
issues with focus on key components like istiod (Pilot), Ingress/Egress
gateways, and Istio’s overall Envoy usage as its data plane proxy. Additionally,
Istio documentation, including security guides, were audited for correctness and
clarity. The report was compiled against Istio version 1.6.5, and since then the
Product Security Working Group has issued several security releases as new
vulnerabilities were disclosed, along with fixes to address concerns raised in
the new report.
An important conclusion from the report is that the auditors found no “Critical”
issues within the Istio project. This finding validates the continuous and
proactive security review and vulnerability management process implemented by
Istio’s Product Security Working Group (PSWG). For the remaining issues surfaced
by the report, the PSWG went to work on addressing them, and we are glad to
report that all issues marked “High”, and several marked “Medium/Low”, have been
resolved in the releases following the report.
The report also makes strategic recommendations around creating a hardening
guide which is now available in our
Security Best Practices
guide. This is a comprehensive document which pulls together recommendations
from security experts within the Istio community, and industry leaders running
Istio in production. Work is underway to create an opinionated and hardened
security profile for installing Istio in secure environments, but in the interim
we recommend users follow the Security Best Practices guide and configure Istio
to meet their security requirements. With that, let’s look at the analysis and
resolution for various issues raised in the report.
Resolution and learnings
Inability to secure control plane network communications
The report flags configuration options that were available in older versions of
Istio to control how communication is secured to the control plane. Since 1.7,
Istio by default secures all control plane communication and many configuration
options mentioned in the report to manage control plane encryption are no longer
required.
The debug endpoint mentioned in the report is enabled by default (as of Istio
1.10) to allow users to debug their Istio service mesh using the istioctl tool.
It can be disabled by setting the environment variable ENABLE_DEBUG_ON_HTTP to
false as mentioned in the Security Best
Practices
guide. Additionally, in an upcoming version (1.11), this debug endpoint will
be secured by default and a valid Kubernetes service account token will be
required to gain access.
Lack of security related documentation
The report points out gaps in the security related documentation published with
Istio 1.6. Since then, we have created a detailed Security Best Practices
guide with recommendations to ensure users can deploy Istio securely to meet
their requirements.  Moving forward, we will continue to augment this
documentation with more hardening recommendations. We advise users to monitor
the guide for updates.
Lack of VirtualService Gateway field validation enables request hijacking
For this issue, the report uses a valid but permissive Gateway configuration
that can cause requests to be routed incorrectly. Similar to the Kubernetes
RBAC, Istio APIs, including Gateways, can be tuned to be permissive or
restrictive depending upon your requirements.  However, the report surfaced
missing links in our documentation related to best practices and guiding our
users to secure their environments. To address them, we have added a section to
our Security Best Practices guide with steps for running
Gateways securely.
In particular, the section describing using namespace prefixes in hosts
specification
on Gateway resources is strongly recommended to harden your
configuration and prevent this type of request hijacking.
Ingress Gateway configuration generation enables request hijacking
The report raises possible request hijacking when using the default mechanism of
selecting gateway workloads by labels across namespaces in a Gateway resource.
This behavior was chosen by default as it allows delegation of managing Gateway
and VirtualService resources to the applications team while allowing operations
teams to centrally manage the ingress gateway workloads for meeting their unique
security requirements like running on dedicated nodes for instance. As
highlighted in the report, if this deployment topology is not a requirement in
your environment it is strongly recommended to co-locate Gateway resources with
your gateway workloads and set the environment variable
PILOT_SCOPE_GATEWAY_TO_NAMESPACE to true.
Please refer to the gateway deployment topologies guide
to understand the various recommended deployment models by the
Istio community. Additionally, as mentioned in the
Security Best Practices
guide, Gateway resource creation should be access controlled using Kubernetes
RBAC or other policy enforcement mechanisms to ensure only authorized entities
can create them.
Other Medium and Low Severity Issues
There are two medium severity issues reported related to debug information
exposed at various levels within the project which can be used to gain access to
sensitive information or orchestrate Denial of Service (DOS) attacks. While
Istio by default enables these debug interfaces for profiling or enabling tools
like “istioctl”, they can be disabled by setting the environment variable
ENABLE_DEBUG_ON_HTTP to false as discussed above.
The report correctly points out that various utilities like sudo, tcpdump, etc.
installed in the default images shipped by Istio can lead to privilege
escalation attacks. These utilities are  provided to aid runtime debugging of
packets flowing through the mesh, and users are recommended to use
hardened versions
of these images in production.
The report also surfaces a known architectural limitation with any sidecar
proxy-based service mesh implementation which uses iptables for intercepting
traffic. This mechanism is susceptible to
sidecar proxy bypass,
which is a valid concern for secure environments. It can be addressed by following the
defense-in-depth
recommendation of the Security Best Practices guide. We are
also investigating more secure options in collaboration with the Kubernetes
community.
The tradeoff between useful and secure
You may have noticed a trend in the findings of the assessment and the
recommendations made to address them. Istio provides various configuration
options to create a more secure installation based on your requirement, and we
have introduced a comprehensive Security Best Practices
guide for our users to follow. As Istio is widely adopted in production, it is
a tradeoff for us between switching to secure defaults and possible migration
issues for our existing users on upgrades. The Istio Product Security Working
Group evaluates each of these issues and creates a plan of action to enable
secure default on a case-by-case basis after giving our users a number of
releases to opt-in the secure configuration and migrate their workloads.
Lastly, there were several lessons for us during and after undergoing a neutral
security assessment. The primary one was to ensure our security practices are
robust to quickly respond to the findings, and more importantly making security
enhancements while maintaining our standards for upgrades without disruption.
To continue this endeavor, we are always looking for feedback and participation
in the Istio Product Security Working Group, so
join our public meetings
to raise issues or learn about what we are doing to keep Istio secure!



Join us at the Istio Community Meetup in China
Tue, 06 Jul 2021 00:00:00 +0000
With the rapid popularization of cloud native technology in China, Istio has also gained popularity in this corner of the world. Almost all Chinese CSPs have creating and are running service mesh products based on Istio.
We welcomed thousands of Istio users and developers to the first IstioCon in February 2021, and the attendees expressed an interest in participating in more meetups and helping to grow the community at the local level.
To this end, the Istio community united six partners — the China Academy of Information and Communications Technology, Alibaba Cloud, Huawei Cloud, Intel, Tencent Cloud, and Tetrate — to co-host the first official Istio Community Meetup China. We have invited a number of industry experts to share comprehensive Istio technical practices with everyone at an in-person meetup. We will serve some refreshments, and seats are limited, so we will operate on a first-come first-served basis. Please register to attend.
Time and day: 13:30-17:30 (CST), July 10, 2021
Venue: Industrial Internet Exhibition Hall, 2nd Floor, Research Building, China Academy of Information and Communications Technology, No. 52 Huayuan North Road, Haidian District, Beijing
Agenda

  
      
          Session time (CST)
          Title
      
  
  
      
          13:30 - 13:50
          Sign in
      
      
          13:50 - 14:00
          Welcome
Craig Box, Istio Steering Committee member, Google Cloud
Iris Ding, cloud computing engineer, Intel
      
      
          14:00 - 14:30
          Interpretation of the “Service Grid Technical Capability Requirements” Standard
Yin Xia Mengxue, Engineer, Cloud Computing Department, Academy of Information and Communications Technology
      
      
          14:30 - 15:00
          Service Mesh Data Plane Hot Upgrade
Shi Zehuan, Alibaba Cloud
      
      
          15:00 - 15:30
          Envoy Principle Introduction and Online Problem Pit
Zhang Wei, Data Plane Technical Expert, Huawei Cloud Service Mesh
      
      
          15:30 - 15:45
          Coffee break
      
      
          15:45 - 16:15
          Use eBPF to accelerate Istio/Envoy networking
Zhong Luyao, Intel
      
      
          16:15 - 16:45
          Full-stack service mesh: how Aeraki helps you manage any Layer 7 traffic in Istio
Huabing Zhao, Senior Engineer, Tencent Cloud
      
      
          16:45 - 17:15
          Securing workload deployment with Istio CNI
Zhang Zhihan, Tetrate
      
  

We want to thank our community in China who have worked on this event, especially Iris Ding, Wei W Hu, Jimmy Song, Zhonghu Xu, Xining Wang, and Huabing Zhao. We hope you can join!



Steering and TOC updates
Tue, 29 Jun 2021 00:00:00 +0000
Last year we introduced a new Steering Committee charter, which shares governance responsibilities between Contribution Seats, selected based on contributions to the project, and Community Seats, elected by the project members. We elected four members, with the committee representing seven different companies.
It’s now time to kick off our 2021 election for Community Seats. Members have two weeks to submit nominations, and voting will run from 12 to 25 July. You can learn all about the election, including how to stand and how to vote, in the istio/community repository on GitHub.
Just like last year, any project member can stand for election.  All Istio members who have been active in the last 12 months are eligible to vote.
Technical Oversight Committee updates
We wish to offer our thanks to Dan Berg and Joshua Blatt, both long-time contributors to the Istio project, who have recently taken new jobs outside the service mesh space. That left two vacancies on the Istio Technical Oversight Committee (TOC), responsible for cross-cutting product and design decisions.
TOC members are elected by the Steering Committee from the working group leads, and last week we voted for two new members:

John Howard, from Google, has become one of the most active contributors to Istio since joining the project in January 2019. He is currently a lead in the Networking working group, and has also served as an Environments working group lead and release manager for version 1.4.
Brian Avery, from Red Hat, has been active in the Istio community for over 3 years. He served as Release Manager for Istio 1.3 and 1.6, and has remained actively involved in the Istio release process, including introducing tooling for release notes, streamlining the feature maturity process, and working on documentation testing. Most recently, Brian was a lead in the Test and Release and Product Security working groups.

Congratulations to John and Brian!
As our new TOC members step into their roles, they will be vacating their current positions as working group leads. We are always on the lookout for community members who are interested in joining, or leading, Istio working groups. If you’re interested, please reach out in the working group channels on Slack, or during the public working group meetings of your interest.



Configuring failover for external services
Fri, 04 Jun 2021 00:00:00 +0000
Istio’s powerful APIs can be used to solve a variety of service mesh use cases. Many users know about its strong ingress and east-west capabilities but it also offers many features for egress (outgoing) traffic. This is especially useful when your application needs to talk to an external service - such as a database endpoint provided by a cloud provider. There are often multiple endpoints to chose from depending on where your workload is running. For example, Amazon’s DynamoDB provides several endpoints across their regions. You typically want to choose the endpoint closest to your workload for latency reasons, but you may need to configure automatic failover to another endpoint in case things are not working as expected.
Similar to services running inside the service mesh, you can configure Istio to detect outliers and failover to a healthy endpoint, while still being completely transparent to your application. In this example, we’ll use Amazon DynamoDB endpoints and pick a primary region that is the same or close to workloads running in a Google Kubernetes Engine (GKE) cluster. We’ll also configure a failover region.

  
      
          Routing
          Endpoint
      
  
  
      
          Primary
          http://dynamodb.us-east-1.amazonaws.com
      
      
          Failover
          http://dynamodb.us-west-1.amazonaws.com
      
  


Define external endpoints using a ServiceEntry
Locality load balancing works based on region or zone, which are usually inferred from labels set on the Kubernetes nodes. First, determine the location of your workloads:
$ kubectl describe node | grep failure-domain.beta.kubernetes.io/region
                    failure-domain.beta.kubernetes.io/region=us-east1
                    failure-domain.beta.kubernetes.io/region=us-east1
In this example, the GKE cluster nodes are running in us-east1.
Next, create a ServiceEntry which aggregates the endpoints you want to use. In this example, we have selected mydb.com as the host. This is the address your application should be configured to connect to. Set the locality of the primary endpoint to the same region as your workload:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-svc-dns
spec:
  hosts:
  - mydb.com
  location: MESH_EXTERNAL
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS
  endpoints:
  - address: dynamodb.us-east-1.amazonaws.com
    locality: us-east1
    ports:
      http: 80
  - address: dynamodb.us-west-1.amazonaws.com
    locality: us-west
    ports:
      http: 80
Let’s deploy a sleep container to use as a test source for sending requests.
Zip$ kubectl apply -f @samples/sleep/sleep.yaml@
From the sleep container try going to http://mydb.com 5 times:
$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
You will see that Istio is sending requests to both endpoints. We only want it to send to the endpoint marked with the same region as our nodes.
For that, we need to configure a DestinationRule.
Set failover conditions using a DestinationRule
Istio’s DestinationRule lets you configure load balancing, connection pool, and outlier detection settings. We can specify the conditions used to identify an endpoint as unhealthy and remove it from the load balancing pool.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: mydynamodb
spec:
  host: mydb.com
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 15s
      baseEjectionTime: 1m
The above DestinationRule configures the endpoints to be scanned every 15 seconds, and if any endpoint fails with a 5xx error code, even once, it will be marked unhealthy for one minute. If this circuit breaker is not triggered, the traffic will route to the same region as the pod.
If we run our curl again, we should see that traffic is always going to the us-east1 endpoint.
$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done

healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com
Simulate a failure
Next, let’s see what happens if the us-east endpoint goes down. To simulate this, let’s modify the ServiceEntry and set the us-east endpoint to an invalid port:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-svc-dns
spec:
  hosts:
  - mydb.com
  location: MESH_EXTERNAL
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS
  endpoints:
  - address: dynamodb.us-east-1.amazonaws.com
    locality: us-east1
    ports:
      http: 81 # INVALID - This is purposefully wrong to trigger failover
  - address: dynamodb.us-west-1.amazonaws.com
    locality: us-west
    ports:
      http: 80
Running our curl again shows that traffic is automatically failed over to our us-west region after failing to connect to the us-east endpoint:
$ for i in {1..5}; do kubectl exec deploy/sleep -c sleep -- curl -sS http://mydb.com; echo; sleep 2; done
upstream connect error or disconnect/reset before headers. reset reason: connection failure
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
healthy: dynamodb.us-west-1.amazonaws.com
You can check the outlier status of the us-east endpoint by running:
$ istioctl pc endpoints  | grep mydb
ENDPOINT                         STATUS      OUTLIER CHECK     CLUSTER
52.119.226.80:81                 HEALTHY     FAILED            outbound|80||mydb.com
52.94.12.144:80                  HEALTHY     OK                outbound|80||mydb.com
Failover for HTTPS
Configuring failover for external HTTPS services is just as easy. Your application can still continue to use plain HTTP, and you can let the Istio proxy perform the TLS origination to the HTTPS endpoint.
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-svc-dns
spec:
  hosts:
  - mydb.com
  ports:
  - number: 80
    name: http-port
    protocol: HTTP
    targetPort: 443
  resolution: DNS
  endpoints:
  - address: dynamodb.us-east-1.amazonaws.com
    locality: us-east1
  - address: dynamodb.us-west-1.amazonaws.com
    locality: us-west
The above ServiceEntry defines the mydb.com service on port 80 and redirects traffic to the real DynamoDB endpoints on port 443.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: mydynamodb
spec:
  host: mydb.com
  trafficPolicy:
    tls:
      mode: SIMPLE
    loadBalancer:
      simple: ROUND_ROBIN
      localityLbSetting:
        enabled: true
        failover:
          - from: us-east1
            to: us-west
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 15s
      baseEjectionTime: 1m
The DestinationRule now performs TLS origination and configures the outlier detection. The rule also has a failover field configured where you can specify exactly what regions are failover targets. This is useful when you have several regions defined.
Wrapping Up
Istio’s VirtualService and DestinationRule API’s provide traffic routing, failure recovery and fault injection features so that you can create resilient applications. The ServiceEntry API extends many of these features to external services that are not part of your service mesh.



Safely upgrade the Istio control plane with revisions and tags
Wed, 26 May 2021 00:00:00 +0000
Like all security software, your service mesh should be kept up-to-date. The Istio community releases new versions every quarter, with regular patch releases for bug fixes and security vulnerabilities. The operator of a service mesh will need to upgrade the control plane and data plane components many times. You must take care when upgrading, as a mistake could affect your business traffic. Istio has many mechanisms to make it safe to perform upgrades in a controlled manner, and in Istio 1.10 we further improve this operational experience.
Background
In Istio 1.6, we added basic support for upgrading the service mesh following a canary pattern using revisions. Using this approach, you can run multiple control planes side-by-side without impacting an existing deployment and slowly migrate workloads from the old control plane to the new.
To support this revision-based upgrade, Istio introduced a istio.io/rev label for namespaces. This indicates which control plane revision should inject sidecar proxies for the workloads in the respective namespace. For example, a label of istio.io/rev=1-9-5 indicates the control plane revision 1-9-5 should inject the data plane using proxies for 1-9-5 for workloads in that namespace.
If you wanted to upgrade the data-plane proxies for a particular namespace, you would update the istio.io/rev label to point to a new version, such as istio.io/rev=1-10-0. Manually changing (or even trying to orchestrate) changes of labels across a large number of namespaces can be error-prone and lead to unintended downtime.
Introducing Revision Tags
In Istio 1.10, we’ve improved revision-based upgrades with a new feature called revision tags. A revision tag reduces the number of changes an operator has to make to use revisions, and safely upgrade an Istio control plane. You use the tag as the label for your namespaces, and assign a revision to that tag. This means you don’t have to change the labels on a namespace while upgrading, and minimizes the number of manual steps and configuration changes.
For example, you can define a tag named prod-stable and point it to the 1-9-5 revision of a control plane. You can also define another tag named prod-canary which points to the 1-10-0 revision. You may have a lot of important namespaces in your cluster, and you can label those namespaces with istio.io/rev=prod-stable. In other namespaces you may be willing to test the new version of Istio, and you can label that namespace istio.io/rev=prod-canary. The tag will indirectly associate those namespaces with the 1-9-5 revision for prod-stable and 1-10-0 for prod-canary respectively.

    
        
            
        
    
    Stable revision tags

Once you’ve determined the new control plane is suitable for the rest of the prod-stable namespaces, you can change the tag to point to the new revision. This enables you to update all the namespaces labeled prod-stable to the new 1-10-0 revision without making any changes to the labels on the namespaces. You will need to restart the workloads in a namespace once you’ve changed the tag to point to a different revision.

    
        
            
        
    
    Updated revision tags

Once you’re satisfied with the upgrade to the new control-plane revision, you can remove the old control plane.
Stable revision tags in action
To create a new prod-stable tag for a revision 1-9-5, run the following command:
$ istioctl x revision tag set prod-stable --revision 1-9-5
You can then label your namespaces with the istio.io/rev=prod-stable label. Note, if you installed a default revision (i.e., no revision) of Istio, you will first have to remove the standard injection label:
$ kubectl label ns istioinaction istio-injection-
$ kubectl label ns istioinaction istio.io/rev=prod-stable
You can list the tags in your mesh with the following:
$ istioctl x revision tag list

TAG         REVISION NAMESPACES
prod-stable 1-9-5    istioinaction
A tag is implemented with a MutatingWebhookConfiguration. You can verify a corresponding MutatingWebhookConfiguration has been created:
$ kubectl get MutatingWebhookConfiguration

NAME                             WEBHOOKS   AGE
istio-revision-tag-prod-stable   2          75s
istio-sidecar-injector           1          5m32s
Let’s say you are trying to canary a new revision of the control plane based on 1.10.0. First you would install the new version using a revision:
$ istioctl install -y --set profile=minimal --revision 1-10-0
You can create a new tag called prod-canary and point that to your 1-10-0 revision:
$ istioctl x revision tag set prod-canary --revision 1-10-0
Then label your namespaces accordingly:
$ kubectl label ns istioinaction-canary istio.io/rev=prod-canary
If you list out the tags in your mesh, you will see two stable tags pointing to two different revisions:
$ istioctl x revision tag list

TAG         REVISION NAMESPACES
prod-stable 1-9-5    istioinaction
prod-canary 1-10-0   istioinaction-canary
Any of the namespaces that you have labeled with istio.io/rev=prod-canary will be injected by the control plane that corresponds to the prod-canary stable tag name (which in this example points to the 1-10-0 revision). When you’re ready, you can switch the prod-stable tag to the new control plane with:
$ istioctl x revision tag set prod-stable --revision 1-10-0 --overwrite
Any time you switch a tag to point to a new revision, you will need to restart the workloads in any respective namespace to pick up the new revision’s proxy.
When both the prod-stable and prod-canary no longer point to the old revision, it may be safe to remove the old revision as follows:
$ istioctl x uninstall --revision 1-9-5
Wrapping up
Using revisions makes it safer to canary changes to an Istio control plane. In large environments with lots of namespaces, you may prefer to use stable tags, as we’ve introduced in this blog, to remove the number of moving pieces and simplify any automation you may build around updating an Istio control plane. Please check out the 1.10 release and the new tag feature and give us your feedback!



Happy Birthday, Istio!
Mon, 24 May 2021 00:00:00 +0000
Celebrating Istio’s 4th birthday
Four years ago today, the Istio project was born to the open source world. To celebrate this anniversary,
we are hosting a week-long birthday celebration that focuses on contributions to the Istio project that
stem from using Istio in production. Read on to learn how to participate in this celebration and enter a
chance to win some Istio swag.

    
        
            
        
    
    Istio's 4th Birthday!

A year of important developments for Istio
Over the last 12 months, the Istio project has been very focused on the day-0
& day-1 experience for
users by actively listening to our users through UX surveys and GitHub issues.

We simplified the control plane architecture and
made Istio easier to install, configure and upgrade.
We provided clarity and process to our feature status and promotion of features and APIs.
We simplified the debugging experience with various istioctl commands.
We expanded the mesh to services running in VMs
and multiple clusters.
We made StatefulSet easier to use in Istio 1.10 with zero-configuration.
We made various performance improvements to the Istio control plane and data plane via discovery selectors, sidecar resources etc.
We introduced WebAssembly as our extensibility platform which has helped users tailor Istio to their needs.
We beefed up our CVE management and release processes to meet enterprise needs.

Read more about improvements to
Istio in 2020 that made this technology easier to use.
In February 2021, we celebrated the first IstioCon! This community-led event was an opportunity
for users and developers to share many examples
of how they use Istio in production and lessons
learned from it. IstioCon was a great opportunity to feature more than 25 end-user companies,
like Salesforce, T-Mobile, and Airbnb, among others, to feature maintainers from across the Istio
ecosystem, and to share the Istio project roadmap.
This inaugural community conference was a major success, with more than 4,000 registrants from 80
countries participating. The program was conducted during US and Asia time zones,
and in English and Chinese languages to accommodate big user communities in various continents.
Learn more about the impact
of IstioCon and find the presentations on
the conference website.
How to participate in Istio’s 4th Birthday celebration
Contributions are key to the long life of an open source project. This is why, on its 4th birthday,
we want to hear about your contributions to the Istio project. To participate in this campaign,
share on Twitter a contribution you made to the project and why it matters, using the hashtag #IstioTurns4
and #IstioBirthday. You can submit posts from Monday, May 24th at 9 am Pacific, until
Friday, May 28th, at 12 pm Pacific, to enter a chance to win some Istio swag.
The other way of participating in this campaign is by joining the Istio community meetup, which
will take place on Thursday, May 27th at 10 am Pacific. At this event, we will have Pratima Nambiar
discuss contributions that have stemmed from using Istio in production at Salesforce. Join the event,
and ask a question or make a comment on the demo, and enter a chance to win some Istio swag.

    
        
            
        
    
    Istio Community Meetup!




Announcing Support for 1.8 to 1.10 Direct Upgrades
Mon, 24 May 2021 00:00:00 +0000
As Service Mesh technology moves from cutting edge to stable infrastructure, many users have expressed an interest in upgrading their service mesh less frequently, as qualifying a new minor release can take a lot of time. Upgrading can be especially difficult for users who don’t keep up with new releases, as Istio has not supported upgrades across multiple minor versions.  To upgrade from 1.6.x to 1.8.x, users first had to upgrade to 1.7.x and then to 1.8.x.
With the release of Istio 1.10, we are announcing Alpha level support for upgrading directly from Istio 1.8.x to 1.10.x, without upgrading to 1.9.x.  We hope this will reduce the operational burden of running Istio, in keeping with our 2021 theme of improving Day 2 Operations.
Upgrade From 1.8 to 1.10
For direct upgrades we recommend using the canary upgrade method so that control plane functionality can be verified before cutting workloads over to the new version. We’ll also be using revision tags in this guide, an improvement to canary upgrades that was introduced in 1.10, so users don’t have to change the labels on a namespace while upgrading.
First, using a version 1.10 or newer istioctl, create a revision tag stable pointed to your existing 1.8 revision. From now on let’s assume this revision is called 1-8-5:
$ istioctl x revision tag set stable --revision 1-8-5
If your 1.8 installation did not have an associated revision, we can create this revision tag with:
$ istioctl x revision tag set stable --revision default
Now, relabel your namespaces that were previously labeled with istio-injection=enabled or istio.io/rev= with istio.io/rev=stable. Download the Istio 1.10.0 release and install the new control plane with a revision:
$ istioctl install --revision 1-10-0 -y
Now evaluate that the 1.10 revision has come up correctly and is healthy. Once satisfied with the stability of new revision you can set the revision tag to the new revision:
$ istioctl x revision tag set stable --revision 1-10-0 --overwrite
Verify that the revision tag stable is pointing to the new revision:
$ istioctl x revision tag list
TAG    REVISION NAMESPACES
stable 1-10-0        ...
Once prepared to move existing workloads over to the new 1.10 revision, the workloads must be restarted so that the sidecar proxies will use the new control plane. We can go through namespaces one by one and roll the workloads over to the new version:
$ kubectl rollout restart deployments -n …
Notice an issue after rolling out workloads to the new Istio version? No problem! Since you’re using canary upgrades, the old control plane is still running and we can just switch back over.
$ istioctl x revision tag set prod --revision 1-8-5
Then after triggering another rollout, your workloads will be back on the old version.
We look forward to hearing about your experience with direct upgrades, and look forward to improving and expanding this functionality in the future.



StatefulSets Made Easier With Istio 1.10
Wed, 19 May 2021 00:00:00 +0000
Kubernetes StatefulSets are commonly used to manage stateful applications. In addition to managing the deployment and scaling of a set of Pods, StatefulSets provide guarantees about the ordering and uniqueness of those Pods. Common applications used with StatefulSets include ZooKeeper, Cassandra, Elasticsearch, Redis and NiFi.
The Istio community has been making gradual progress towards zero-configuration support for StatefulSets; from automatic mTLS, to eliminating the need to create DestinationRule or ServiceEntry resources, to the most recent pod networking changes in Istio 1.10.
What is unique about using a StatefulSet with a service mesh? The StatefulSet pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling. The kind of apps that run in a StatefulSet are often those that need to communicate among their pods, and, as they come from a world of hard-coded IP addresses, may listen on the pod IP only, instead of 0.0.0.0.
ZooKeeper, for example, is configured by default to not listen on all IPs for quorum communication:
quorumListenOnAllIPs=false
Over the last few releases, the Istio community has reported many issues around support for applications running in StatefulSets.
StatefulSets in action, prior to Istio 1.10
In a GKE cluster running Kubernetes 1.19, we have Istio 1.9.5 installed. We enabled automatic sidecar injection in the default namespace, then we installed ZooKeeper using the Helm charts provided by Bitnami, along with the Istio sleep pod for interactive debugging:
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/zookeeper --set replicaCount=3
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.29/samples/sleep/sleep.yaml
After a few minutes, all pods come up nicely with sidecar proxies:
$ kubectl get pods,svc
NAME                             READY   STATUS    RESTARTS   AGE
my-release-zookeeper-0           2/2     Running   0          3h4m
my-release-zookeeper-1           2/2     Running   0          3h4m
my-release-zookeeper-2           2/2     Running   0          3h5m
pod/sleep-8f795f47d-qkgh4        2/2     Running   0          3h8m

NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                            AGE
my-release-zookeeper            ClusterIP   10.100.1.113           2181/TCP,2888/TCP,3888/TCP         3h
my-release-zookeeper-headless   ClusterIP   None                   2181/TCP,2888/TCP,3888/TCP         3h
service/sleep                   ClusterIP   10.100.9.26            80/TCP                             3h
Are our ZooKeeper services working and is the status Running? Let’s find out! ZooKeeper listens on 3 ports:

Port 2181 is the TCP port for clients to connect to the ZooKeeper service
Port 2888 is the TCP port  for peers to connect to other peers
Port 3888 is the dedicated TCP port for leader election

By default, the ZooKeeper installation configures port 2181 to listen on 0.0.0.0 but ports 2888 and 3888 only listen on the pod IP. Let’s check out the network status on each of these ports from one of the ZooKeeper pods:
$ kubectl exec my-release-zookeeper-1 -c istio-proxy -- netstat -na | grep -E '(2181|2888|3888)'
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN
tcp        0      0 10.96.7.7:3888          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:2181          127.0.0.1:37412         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37486         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37456         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37498         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37384         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37514         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37402         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37434         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37526         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37374         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37442         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:37464         TIME_WAIT
There is nothing ESTABLISHED on port 2888 or 3888.  Next, let us get the ZooKeeper server status:
$ kubectl exec my-release-zookeeper-1 -c zookeeper -- /opt/bitnami/zookeeper/bin/zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Error contacting service. It is probably not running.
From the above output, you can see the ZooKeeper service is not functioning properly. Let us check the cluster configuration for one of the ZooKeeper pods:
$ istioctl proxy-config cluster my-release-zookeeper-1 --port 3888 --direction inbound -o json
[
    {
        "name": "inbound|3888||",
        "type": "STATIC",
        "connectTimeout": "10s",
        "loadAssignment": {
            "clusterName": "inbound|3888||",
            "endpoints": [
                {
                    "lbEndpoints": [
                        {
                            "endpoint": {
                                "address": {
                                    "socketAddress": {
                                        "address": "127.0.0.1",
                                        "portValue": 3888
                                    }
                                }
                            }
                        }
                    ]
                }
            ]
        },
...
What is interesting here is that the inbound on port 3888 has 127.0.0.1 as its endpoint. This is because the Envoy proxy, in versions of Istio prior to 1.10, redirects the inbound traffic to the loopback interface, as described in our blog post about the change.
StatefulSets in action with Istio 1.10
Now, we have upgraded our cluster to Istio 1.10 and configured the default namespace to enable 1.10 sidecar injection. Let’s rolling restart the ZooKeeper StatefulSet to update the pods to use the new version of the sidecar proxy:
$ kubectl rollout restart statefulset my-release-zookeeper
Once the ZooKeeper pods reach the running status, let’s check out the network connections for these 3 ports from any of the ZooKeeper pods:
$ kubectl exec my-release-zookeeper-1 -c istio-proxy -- netstat -na | grep -E '(2181|2888|3888)'
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN
tcp        0      0 10.96.8.10:2888         0.0.0.0:*               LISTEN
tcp        0      0 10.96.8.10:3888         0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.6:42571         10.96.8.10:2888         ESTABLISHED
tcp        0      0 10.96.8.10:2888         127.0.0.6:42571         ESTABLISHED
tcp        0      0 127.0.0.6:42655         10.96.8.10:2888         ESTABLISHED
tcp        0      0 10.96.8.10:2888         127.0.0.6:42655         ESTABLISHED
tcp        0      0 10.96.8.10:37876        10.96.6.11:3888         ESTABLISHED
tcp        0      0 10.96.8.10:44872        10.96.7.10:3888         ESTABLISHED
tcp        0      0 10.96.8.10:37878        10.96.6.11:3888         ESTABLISHED
tcp        0      0 10.96.8.10:44870        10.96.7.10:3888         ESTABLISHED
tcp        0      0 127.0.0.1:2181          127.0.0.1:54508         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54616         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54664         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54526         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54532         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54578         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54634         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54588         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54610         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54550         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54560         TIME_WAIT
tcp        0      0 127.0.0.1:2181          127.0.0.1:54644         TIME_WAIT
There are ESTABLISHED connections on both port 2888 and 3888!  Next, let us check out the ZooKeeper server status:
$ kubectl exec my-release-zookeeper-1 -c zookeeper -- /opt/bitnami/zookeeper/bin/zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
The ZooKeeper service is now running!
We can connect to each of the ZooKeeper pods from the sleep pod and run the below command to discover the server status of each pod within the StatefulSet. Note that there is no need to create ServiceEntry resources for any of the ZooKeeper pods and we can call these pods directly using their DNS names (e.g. my-release-zookeeper-0.my-release-zookeeper-headless) from the sleep pod.
$ kubectl exec -it deploy/sleep -c sleep -- sh  -c 'for x in my-release-zookeeper-0.my-release-zookeeper-headless my-release-zookeeper-1.my-release-zookeeper-headless my-release-zookeeper-2.my-release-zookeeper-headless; do echo $x; echo srvr|nc $x 2181; echo; done'
my-release-zookeeper-0.my-release-zookeeper-headless
Zookeeper version: 3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
Latency min/avg/max: 1/7.5/20
Received: 3845
Sent: 3844
Connections: 1
Outstanding: 0
Zxid: 0x200000002
Mode: follower
Node count: 6

my-release-zookeeper-1.my-release-zookeeper-headless
Zookeeper version: 3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
Latency min/avg/max: 0/0.0/0
Received: 3856
Sent: 3855
Connections: 1
Outstanding: 0
Zxid: 0x200000002
Mode: follower
Node count: 6

my-release-zookeeper-2.my-release-zookeeper-headless
Zookeeper version: 3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
Latency min/avg/max: 0/0.0/0
Received: 3855
Sent: 3854
Connections: 1
Outstanding: 0
Zxid: 0x200000002
Mode: leader
Node count: 6
Proposal sizes last/min/max: 48/48/48
Now our ZooKeeper service is running, let’s use Istio to secure all communication to our regular and headless services. Apply mutual TLS to the default namespace:
$ kubectl apply -n default -f - <

Continue sending some traffic from the sleep pod and bring up the Kiali dashboard to visualize the services in the default namespace:

    
        
            
        
    
    Visualize the ZooKeeper Services in Kiali

The padlock icons on the traffic flows indicate that the connections are secure.
Wrapping up
With the new networking changes in Istio 1.10, a Kubernetes pod with a sidecar has the same networking behavior as a pod without a sidecar. This change enables stateful applications to function properly in Istio as we have shown you in this post. We believe this is a huge step towards Istio’s goal of providing transparent service mesh and zero-configuration Istio.



Updates to how Istio security releases are handled: Patch Tuesday, embargoes, and 0-days
Tue, 11 May 2021 00:00:00 +0000
While most of the work in the Istio Product Security Working Group is done behind the scenes, we are listening
to the community in setting expectations for security releases. We understand that it is difficult for mesh
administrators, operators and vendors to be aware of security bulletins and security releases.
We currently disclose vulnerabilities and security releases via numerous channels:

istio.io via our Release Announcements and Security Bulletins
Discuss
announcements channel on Slack
Twitter
RSS

When operating any software, it is preferable to plan for possible downtime when upgrading. Given the work that the Istio
community is doing around Day 2 operations in 2021, the Environments working group has done a good job to streamline many
upgrade issues users have seen. The Product Security Working Group intends to help Day 2 operations by having routine
security release days so that upgrade operations can be planned in advance for our users.
Patch Tuesdays
The Product Security working group is intending to ship a security release the 2nd Tuesday of each month. These security
releases may contain fixes for multiple CVEs. It is the intent of the Product Security working group to have these
security releases not contain any other fixes, although that may not always be possible.
When the Product Security working group intends to ship an upcoming security patch, an
announcement will be made on the Istio discussion
board 2 weeks prior to release. If you’re
running Istio in production,  we suggest you watch the Announcements category to be
notified of such a release. If no such announcement is made there will not be a security
release for that month, barring some exceptions listed below.
First Patch Tuesday
We are pleased to announce that Istio 1.9.5, and the final release of Istio 1.8,
1.8.6, are the first security releases to fit this pattern. As Istio 1.10 will
be shipping soon we are intending to continue this new tradition in June.
These releases fix 3 CVEs. Please see the release pages for information regarding the specific CVEs fixed.
Unscheduled security releases
0-day vulnerabilities
Unfortunately, 0-day vulnerabilities cannot be planned. Upon disclosure, the Product Security Working Group will
need to issue an out-of-band security release. The above methods will be used to disclose such issues, so please use
at least one of them to be notified of such disclosures.
Third party embargoes
Similar to 0-day vulnerabilities, security releases can be dictated by third party embargoes, namely Envoy.
When this occurs, Istio will release a same-day patch once the embargo is lifted.
Security Best Practices
The Istio Security Best Practices has seen many improvements over the past few
months. We recommend you check it regularly, as many of our recent security bulletins can be mitigated by utilizing
methods discussed in the Security Best Practices page.
Early Disclosure List
If you meet the criteria to be
a part of the Istio Early Disclosure list, please
apply for membership. Patches for upcoming security releases will be made available to the early disclosure list ~2 weeks
prior to Istio’s Patch Tuesday.
There will be times when an upcoming Istio security release will also need patches from Envoy. We cannot redistribute
Envoy patches due to their embargo. Please refer to Envoy’s guidance
on how to join their early disclosure list.
Security Feedback
The Product Security Working Group holds bi-weekly meetings on Tuesdays from 9:00-9:30 Pacific. For more information see
the Istio Working Group Calendar.
Our next public meeting will be held on May 25, 2021. Please join us!



Use discovery selectors to configure namespaces for your Istio service mesh
Fri, 30 Apr 2021 00:00:00 +0000
As users move their services to run in the Istio service mesh, they are often surprised that the control plane watches and processes all of the Kubernetes resources, from all namespaces in the cluster, by default. This can be an issue for very large clusters with lots of namespaces and deployments, or even for a moderately sized cluster with rapidly churning resources (for example, Spark jobs).
Both in the community as well as for our large-scale customers at Solo.io, we need a way to dynamically restrict the set of namespaces that are part of the mesh so that the Istio control plane only processes resources in those namespaces. The ability to restrict the namespaces enables Istiod to watch and push fewer resources and associated changes to the sidecars, thus improving the overall performance on the control plane and data plane.
Background
By default, Istio watches all Namespaces, Services, Endpoints and Pods in a cluster. For example, in my Kubernetes cluster, I deployed the sleep service in the default namespace, and the httpbin service in the ns-x namespace. I’ve added the sleep service to the mesh, but I have no plan to add the httpbin service to the mesh, or have any service in the mesh interact with the httpbin service.
Use istioctl proxy-config endpoint command to display all the endpoints for the sleep deployment:

    
        
            
        
    
    Endpoints for Sleep Deployment

Note that the httpbin service endpoint in the ns-x namespace is in the list of discovered endpoints. This may not be an issue when you only have a few services. However, when you have hundreds of services that don’t interact with any of the services running in the Istio service mesh, you probably don’t want your Istio control plane to watch these services and send their information to the sidecars of your services in the mesh.
Introducing Discovery Selectors
Starting with Istio 1.10, we are introducing the new discoverySelectors option to MeshConfig, which is an array of Kubernetes selectors. The exact type is []LabelSelector, as defined here, allowing both simple selectors and set-based selectors. These selectors apply to labels on namespaces.
You can configure each label selector for expressing a variety of use cases, including but not limited to:

Arbitrary label names/values, for example, all namespaces with label istio-discovery=enabled
A list of namespace labels using set-based selectors which carries OR semantics, for example, all namespaces with label istio-discovery=enabled OR region=us-east1
Inclusion and/or exclusion of namespaces, for example, all namespaces with label istio-discovery=enabled AND label key app equal to helloworld

Note: discoverySelectors is not a security boundary. Istiod will continue to have access to all namespaces even when you have configured your discoverySelectors.
Discovery Selectors in Action
Assuming you know which namespaces to include as part of the service mesh, as a mesh administrator, you can configure discoverySelectors at installation time or post-installation by adding your desired discovery selectors to Istio’s MeshConfig resource. For example, you can configure Istio to discover only the namespaces that have the label istio-discovery=enabled.


Using our examples earlier, let’s label the default namespace with label istio-discovery=enabled.
$ kubectl label namespace default istio-discovery=enabled


Use istioctl to apply the yaml with discoverySelectors to update your Istio installation. Note, to avoid any impact to your stable environment, we recommend that you use a different revision for your Istio installation:
$ istioctl install --skip-confirmation -f - <



Display the endpoint configuration for the sleep deployment:

        
            
                
            
        
        Endpoints for Sleep Deployment With Discovery Selectors
    
Note this time the httpbin service in the ns-x namespace is NOT in the list of discovered endpoints, along with many other services that are not in the default namespace. If you display routes (or cluster or listeners) information for the sleep deployment, you will also notice much less configuration is returned:

        
            
                
            
        
        Routes for Sleep Deployment With Discovery Selectors
    


You can use matchLabels to configure multiple labels with AND semantics or use matchLabels sets to configure OR semantics among multiple labels. Whether you deploy services or pods to namespaces with different sets of labels or multiple application teams in your organization use different labeling conventions, discoverySelectors provides the flexibility you need. Furthermore, you could use matchLabels and matchExpressions together per our documentation. Refer to the Kubernetes selector docs for additional detail on selector semantics.
Discovery Selectors vs Sidecar Resource
The discoverySelectors configuration enables users to dynamically restrict the set of namespaces that are part of the mesh. A Sidecar resource also controls the visibility of sidecar configurations and what gets pushed to the sidecar proxy. What are the differences between them?

The discoverySelectors configuration declares what Istio control plane watches and processes. Without discoverySelectors configuration, the Istio control plane watches and processes all namespaces/services/endpoints/pods in the cluster regardless of the sidecar resources you have.
discoverySelectors is configured globally for the mesh by the mesh administrators. While Sidecar resources can also be configured for the mesh globally by the mesh administrators in the MeshConfig root namespace,  they are commonly configured by service owners for their namespaces.

You can use discoverySelectors with Sidecar resources. You can use discoverySelectors to configure at the mesh-wide level what namespaces the Istio control plane should watch and process. For these namespaces in the Istio service mesh, you can create Sidecar resources globally or per namespace to further control what gets pushed to the sidecar proxies.  Let us add Bookinfo services to the ns-y namespace in the mesh as shown in the diagram below. discoverySelectors enables us to define the default and ns-y namespaces are part of the mesh. How can we configure the sleep service not to see anything other than the default namespace? Adding a Sidecar resource for the default namespace, we can effectively configure the sleep sidecar to only have visibility to the clusters/routes/listeners/endpoints associated with its current namespace plus any other required namespaces.

    
        
            
        
    
    Discovery Selectors vs Sidecar Resource

Wrapping up
Discovery selectors are powerful configurations to tune the Istio control plane to only watch and process specific namespaces. If you don’t want all namespaces in your Kubernetes cluster to be part of the service mesh or you have multiple Istio service meshes within your Kubernetes cluster, we highly recommend that you explore this configuration and reach out to us for feedback on our Istio slack or GitHub.



Upcoming networking changes in Istio 1.10
Thu, 15 Apr 2021 00:00:00 +0000
Background
While Kubernetes networking is customizable, a typical pod’s network will look like this:

    
        
            
        
    
    A pod's network

An application may choose to bind to either the loopback interface lo (typically binding to 127.0.0.1), or the pods network interface eth0 (typically to the pod’s IP), or both (typically binding to 0.0.0.0).
Binding to lo allows calls such as curl localhost to work from within the pod.
Binding to eth0 allows calls to the pod from other pods.
Typically, an application will bind to both.
However, applications which have internal logic, such as an admin interface may choose to bind to only lo to avoid access from other pods.
Additionally, some applications, typically stateful applications, choose to bind only to eth0.
Current behavior
In Istio prior to release 1.10, the Envoy proxy, running in the same pod as the application, binds to the eth0 interface and redirects all inbound traffic to the lo interface.

    
        
            
        
    
    A pod's network with Istio today

This has two important side effects that cause the behavior to differ from standard Kubernetes:

Applications binding only to lo will receive traffic from other pods, when otherwise this is not allowed.
Applications binding only to eth0 will not receive traffic.

Applications that bind to both interfaces (which is typical) will not be impacted.
Future behavior
Starting with Istio 1.10, the networking behavior is changed to align with the standard behavior present in Kubernetes.

    
        
            
        
    
    A pod's network with Istio in the future

Here we can see that the proxy no longer redirects the traffic to the lo interface, but instead forwards it to the application on eth0.
As a result, the standard behavior of Kubernetes is retained, but we still get all the benefits of Istio.
This change allows Istio to get closer to its goal of being a drop-in transparent proxy that works with existing workloads with zero configuration.
Additionally, it avoids unintended exposure of applications binding only to lo.
Am I impacted?
For new users, this change should only be an improvement.
However, if you are an existing user, you may have come to depend on the old behavior, intentionally or accidentally.
To help detect these situations, we have added a check to find pods that will be impacted.
You can run the istioctl experimental precheck command to get a report of any pods binding to lo on a port exposed in a Service.
This command is available in Istio 1.10+.
Without action, these ports will no longer be accessible upon upgrade.
$ istioctl experimental precheck
Error [IST0143] (Pod echo-local-849647c5bd-g9wxf.default) Port 443 is exposed in a Service but listens on localhost. It will not be exposed to other pods.
Error [IST0143] (Pod echo-local-849647c5bd-g9wxf.default) Port 7070 is exposed in a Service but listens on localhost. It will not be exposed to other pods.
Error: Issues found when checking the cluster. Istio may not be safe to install or upgrade.
See https://istio.io/latest/docs/reference/config/analysis for more information about causes and resolutions.
Migration
If you are currently binding to lo, you have a few options:


Switch your application to bind to all interfaces (0.0.0.0 or ::).


Explicitly configure the port using the Sidecar ingress configuration to send to lo, preserving the old behavior.
For example, to configure request to be sent to localhost for the ratings application:
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: ratings
spec:
  workloadSelector:
    labels:
      app: ratings
  ingress:
  - port:
      number: 8080
      protocol: HTTP
      name: http
    defaultEndpoint: 127.0.0.1:8080


Disable the change entirely with the PILOT_ENABLE_INBOUND_PASSTHROUGH=false environment variable in Istiod, to enable the same behavior as prior to Istio 1.10. This option will be removed in the future.





Istio and Envoy WebAssembly Extensibility, One Year On
Fri, 05 Mar 2021 00:00:00 +0000
One year ago today, in the 1.5 release, we introduced WebAssembly-based extensibility to Istio.
Over the course of the year, the Istio, Envoy, and Proxy-Wasm communities have continued our joint efforts to make WebAssembly (Wasm)
extensibility stable, reliable, and easy to adopt. Let’s walk through the updates to Wasm support through the Istio 1.9 release,
and our plans for the future.
WebAssembly support merged in upstream Envoy
After adding experimental support for Wasm and the WebAssembly for Proxies (Proxy-Wasm) ABI to Istio’s fork of Envoy, we collected some great feedback from our community of early adopters.  This, combined with the experience gained from developing core Istio Wasm extensions, helped us mature and stabilize the runtime.
These improvements unblocked merging Wasm support directly into Envoy upstream in October 2020, allowing it to become part of all official Envoy releases.
This was a significant milestone, since it indicates that:

The runtime is ready for wider adoption.
The programming ABI/API, extension configuration API, and runtime behavior, are becoming stable.
You can expect a larger community of adoption and support moving forward.

wasm-extensions Ecosystem Repository
As an early adopter of the Envoy Wasm runtime, the Istio Extensions and Telemetry working group gained a lot of experience in developing extensions. We built several first-class extensions, including metadata exchange, Prometheus stats, and attribute generation.
In order to share our learning more broadly, we created a wasm-extensions repository in the istio-ecosystem organization. This repository serves two purposes:

It provides canonical example extensions, covering several highly demanded features (such as basic authentication).
It provides a guide for Wasm extension development, testing, and release. The guide is based on the same build tool chains and test frameworks that are used, maintained and tested by the Istio extensibility team.

The guide currently covers WebAssembly extension development
and unit testing with C++,
as well as integration testing with a Go test framework,
which simulates a real runtime by running a Wasm module with the Istio proxy binary.
In the future, we will also add several more canonical extensions, such as an integration with Open Policy Agent, and header manipulation based on JWT tokens.
Wasm module distribution via the Istio Agent
Prior to Istio 1.9, Envoy remote data sources were needed to distribute remote Wasm modules to the proxy.
In this example,
you can see two EnvoyFilter resources are defined: one to add a remote fetch Envoy cluster, and the other one to inject a Wasm filter into the HTTP filter chain.
This method has a drawback: if remote fetch fails, either due to bad configuration or transient error, Envoy will be stuck with the bad configuration.
If a Wasm extension is configured as fail closed, a bad remote fetch will stop Envoy from serving.
To fix this issue, a fundamental change is needed to the Envoy xDS protocol to make it allow asynchronous xDS responses.
Istio 1.9 provides a reliable distribution mechanism out of the box by leveraging the xDS proxy inside istio-agent and Envoy’s Extension Configuration Discovery Service (ECDS).
istio-agent intercepts the extension config resource update from istiod, reads the remote fetch hint from it, downloads the Wasm module, and rewrites the ECDS configuration with the path of the downloaded Wasm module.
If the download fails, istio-agent will reject the ECDS update and prevent a bad configuration reaching Envoy. For more detail, please see our docs on Wasm module distribution.

    
        
            
        
    
    Remote Wasm module fetch flow

Istio Wasm SIG and Future Work
Although we have made a lot of progress on Wasm extensibility, there are still many aspects of the project that remain to be completed. In order to consolidate the efforts from various parties and better tackle the challenges ahead, we have formed an Istio WebAssembly SIG, with aim of providing a standard and reliable way for Istio to consume Wasm extensions. Here are some of the things we are working on:

A first-class extension API: Currently Wasm extensions needs to be injected via Istio’s EnvoyFilter API. A first-class extension API will make using Wasm with Istio easier, and we expect this to be introduced in Istio 1.10.
Distribution artifacts interoperability: Built on top of Solo.io’s WebAssembly OCI image spec effort, a standard Wasm artifacts format will make it easy to build, pull, publish, and execute.
Container Storage Interface (CSI) based artifacts distribution: Using istio-agent to distribute modules is easy for adoption, but may not be efficient as each proxy will keep a copy of the Wasm module. As a more efficient solution, with Ephemeral CSI, a DaemonSet will be provided which could configure storage for pods. Working similarly to a CNI plugin, a CSI driver would fetch the Wasm module out-of-band from the xDS flow and mount it inside the rootfs when the pod starts up.

If you would like to join us, the group will meet every other week Tuesdays at 2PM PT. You can find the meeting on the Istio working group calendar.
We look forward to seeing how you will use Wasm to extend Istio!



Migrate pre-Istio 1.4 Alpha security policy to the current APIs
Wed, 03 Mar 2021 00:00:00 +0000
In versions of Istio prior to 1.4, security policy was configured using v1alpha1 APIs (MeshPolicy, Policy, ClusterRbacConfig, ServiceRole and ServiceRoleBinding). After consulting with our early adopters, we made major improvements to the policy system and released v1beta1 APIs along with Istio 1.4. These refreshed APIs (PeerAuthentication, RequestAuthentication and AuthorizationPolicy) helped standardize how we define policy targets in Istio, helped users understand where policies were applied, and cut the number of configuration objects required.
The old APIs were deprecated in Istio 1.4. Two releases after the v1beta1 APIs were introduced, Istio 1.6 removed support for the v1alpha1 APIs.
If you are using a version of Istio prior to 1.6 and you want to upgrade, you will have to migrate your alpha security policy objects to the beta API. This tutorial will help you make that move.

    
        
            
        
        If you adopted Istio after version 1.6, or you’re not using v1alpha1 security APIs, you can stop reading.
    


Overview
Your control plane must first be upgraded to a version that supports the v1beta1 security policy.
It is recommended to first upgrade to Istio 1.5 as a transitive version, because it is the only version that supports both
v1alpha1 and v1beta1 security policies. You will complete the security policy migration in Istio 1.5, remove the
v1alpha1 security policy, and then continue to upgrade to later Istio versions. For a given workload, the v1beta1
version will take precedence over the v1alpha1 version.
Alternatively, if you want to do a skip-level upgrade directly from Istio 1.4 to 1.6 or later, you should use the
canary upgrade method to install a new Istio version as a separate control plane, and
gradually migrate your workloads to the new control plane completing the security policy migration at the same time.

    
        
            
        
        Skip-level upgrades are not supported by Istio and there might be other issues in this process. Istio 1.6 does not support
the v1alpha1 security policy, and if you do not migrate your old policies before the upgrade, you are essentially removing
all your security policies.
    


In either case, it is recommended to migrate using namespace granularity: for each namespace, find all the
v1alpha1 policies that have an effect on workloads in the namespace and migrate all the policies to v1beta1
at the same time. This allows a safer migration as you can make sure everything is working as expected,
and then move forward to the next namespace.
Major differences
Before starting the migration, read through the v1beta1 authentication
and authorization documentation to understand the v1beta1 policy.
You should examine all of your existing v1alpha1 security policies, find out what fields are used and which policies
need migration, compare the findings with the major differences listed below and confirm there are no blocking issues
(e.g., using an alpha feature that is no longer supported in beta):

  
      
          Major Differences
          v1alpha1
          v1beta1
      
  
  
      
          API stability
          not backward compatible
          backward compatible
      
      
          mTLS
          MeshPolicy and Policy
          PeerAuthentication
      
      
          JWT
          MeshPolicy and Policy
          RequestAuthentication
      
      
          Authorization
          ClusterRbacConfig, ServiceRole and ServiceRoleBinding
          AuthorizationPolicy
      
      
          Policy target
          service name based
          workload selector based
      
      
          Port number
          service ports
          workload ports
      
  

Although RequestAuthentication in v1beta1 security policy is similar to the v1alpha1 JWT policy, there is a notable
semantics change. The v1alpha1 JWT policy needs to be migrated to two v1beta1 resources: RequestAuthentication and
AuthorizationPolicy. This will change the JWT deny message due to the use of AuthorizationPolicy. In the alpha version,
the HTTP code 401 is returned with the body Origin authentication failed. In the beta version, the HTTP code 403 is
returned with the body RBAC: access denied.
The v1alpha1 JWT policy triggerRule field
is replaced by the AuthorizationPolicy with the exception that the regex field
is no longer supported.
Migration flow
This section describes in detail how to migrate a v1alpha1 security policy.
Step 1: Find related policies
For each namespace, find all v1alpha1 security policies that have an effect on workloads in the namespace. The result
could include:

a single MeshPolicy that applies to all services in the mesh;
a single namespace-level Policy that applies to all workloads in the namespace;
multiple service-level Policy objects that apply to the selected services in the namespace;
a single ClusterRbacConfig that enables the RBAC on the whole namespace or some services in the namespace;
multiple namespace-level ServiceRole and ServiceRoleBinding objects that apply to all services in the namespace;
multiple service-level ServiceRole and ServiceRoleBinding objects that apply to the selected services in the namespace;

Step 2: Convert service name to workload selector
The v1alpha1 policy selects targets using their service name. You should refer to the corresponding service definition to decide
the workload selector that should be used in the v1beta1 policy.
A single v1alpha1 policy may include multiple services. It will need to be migrated to multiple v1beta1 policies
because the v1beta1 policy currently only supports at most one workload selector per policy.
Also note the v1alpha1 policy uses service port but the v1beta1 policy uses the workload port. This means the port number might be
different in the migrated v1beta1 policy.
Step 3: Migrate authentication policy
For each v1alpha1 authentication policy, migrate with the following rules:


If the whole namespace is enabled with mTLS or JWT, create the PeerAuthentication, RequestAuthentication and
AuthorizationPolicy without a workload selector for the whole namespace. Fill out the policy based on the
semantics of the corresponding MeshPolicy or Policy for the namespace.


If a workload is enabled with mTLS or JWT, create the PeerAuthentication, RequestAuthentication and
AuthorizationPolicy with a corresponding workload selector for the workload. Fill out the policy based on the
semantics of the corresponding MeshPolicy or Policy for the workload.


For mTLS related configuration, use STRICT mode if the alpha policy is using STRICT, or use PERMISSIVE in all other cases.


For JWT related configuration, refer to the end-user authentication documentation
to learn how to migrate to RequestAuthentication and AuthorizationPolicy.


A security policy migration tool is provided to
automatically migrate authentication policy automatically. Please refer to the tool’s README for its usage.
Step 4: Migrate RBAC policy
For each v1alpha1 RBAC policy, migrate with the following rules:


If the whole namespace is enabled with RBAC, create an AuthorizationPolicy without a workload selector for the whole
namespace. Add an empty rule so that it will deny all requests to the namespace by default.


If a workload is enabled with RBAC, create an AuthorizationPolicy with a corresponding workload selector for the workload.
Add rules based on the semantics of the corresponding ServiceRole and ServiceRoleBinding for the workload.


Step 5: Verify migrated policy


Double check the migrated v1beta1 policies: make sure there are no policies with duplicate names, the namespace
is specified correctly and all v1alpha1 policies for the given namespace are migrated.


Dry-run the v1beta1 policy with the command kubectl apply --dry-run=server -f beta-policy.yaml to make sure it
is valid.


Apply the v1beta1 policy to the given namespace and closely monitor the effect. Make sure to test both allow and
deny scenarios if JWT or authorization are used.


Migrate the next namespace. Only remove the v1alpha1 policy after completing migration for all namespaces successfully.


Example
v1alpha1 policy
This section gives a full example showing the migration for namespace foo. Assume the namespace foo has the following
v1alpha1 policies that affect the workloads in it:
# A MeshPolicy that enables mTLS globally, including the whole foo namespace
apiVersion: "authentication.istio.io/v1alpha1"
kind: "MeshPolicy"
metadata:
  name: "default"
spec:
  peers:
  - mtls: {}
---
# A Policy that enables mTLS permissive mode and enables JWT for the httpbin service on port 8000
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
  name: httpbin
  namespace: foo
spec:
  targets:
  - name: httpbin
    ports:
    - number: 8000
  peers:
  - mtls:
      mode: PERMISSIVE
  origins:
  - jwt:
      issuer: testing@example.com
      jwksUri: https://www.example.com/jwks.json
      triggerRules:
      - includedPaths:
        - prefix: /admin/
        excludedPaths:
        - exact: /admin/status
  principalBinding: USE_ORIGIN
---
# A ClusterRbacConfig that enables RBAC globally, including the foo namespace
apiVersion: "rbac.istio.io/v1alpha1"
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: 'ON'
---
# A ServiceRole that enables RBAC for the httpbin service
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: httpbin
  namespace: foo
spec:
  rules:
  - services: ["httpbin.foo.svc.cluster.local"]
    methods: ["GET"]
---
# A ServiceRoleBinding for the above ServiceRole
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: httpbin
  namespace: foo
spec:
  subjects:
  - user: cluster.local/ns/foo/sa/sleep
    roleRef:
      kind: ServiceRole
      name: httpbin
httpbin service
The httpbin service has the following definition:
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: foo
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
  selector:
    app: httpbin
This means the service name httpbin should be replaced by the workload selector app: httpbin, and the service port 8000
should be replaced by the workload port 80.
v1beta1 authentication policy
The migrated v1beta1 policies for the v1alpha1 authentication policies in foo namespace are listed below:
# A PeerAuthentication that enables mTLS for the foo namespace, migrated from the MeshPolicy
# Alternatively the MeshPolicy could also be migrated to a PeerAuthentication at mesh level
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: foo
spec:
  mtls:
    mode: STRICT
---
# A PeerAuthentication that enables mTLS for the httpbin workload, migrated from the Policy
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: httpbin
  namespace: foo
spec:
  selector:
    matchLabels:
      app: httpbin
  # port level mtls set for the workload port 80 corresponding to the service port 8000
  portLevelMtls:
    80:
      mode: PERMISSIVE
--
# A RequestAuthentication that enables JWT for the httpbin workload, migrated from the Policy
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: httpbin
  namespace: foo
spec:
  selector:
    matchLabels:
      app: httpbin
  jwtRules:
  - issuer: testing@example.com
    jwksUri: https://www.example.com/jwks.json
---
# An AuthorizationPolicy that enforces to require JWT validation for the httpbin workload, migrated from the Policy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: httpbin-jwt
  namespace: foo
spec:
  # Use DENY action to explicitly deny requests without JWT token
  action: DENY
  selector:
    matchLabels:
      app: httpbin
  rules:
  - from:
    - source:
        # This makes sure requests without JWT token will be denied
        notRequestPrincipals: ["*"]
    to:
    - operation:
        # This should be the workload port 80, not the service port 8000
        ports: ["80"]
        # The path and notPath is converted from the trigger rule in the Policy
        paths: ["/admin/*"]
        notPaths: ["/admin/status"]
v1beta1 authorization policy
The migrated v1beta1 policies for the v1alpha1 RBAC policies in foo namespace are listed below:
# An AuthorizationPolicy that denies by default, migrated from the ClusterRbacConfig
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: default
  namespace: foo
spec:
  # An empty rule that allows nothing
  {}
---
# An AuthorizationPolicy that enforces to authorization for the httpbin workload, migrated from the ServiceRole and ServiceRoleBinding
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: httpbin
  namespace: foo
spec:
  selector:
    matchLabels:
      app: httpbin
      version: v1
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/foo/sa/sleep"]
    to:
    - operation:
        methods: ["GET"]
Finish the upgrade
Congratulations; having reached this point, you should only have v1beta1 policy objects, and you will be able to continue upgrading Istio to 1.6 and beyond.



Zero Configuration Istio
Thu, 25 Feb 2021 00:00:00 +0000
When a new user encounters Istio for the first time, they are sometimes overwhelmed by the vast feature
set it exposes. Unfortunately, this can give the impression that Istio is needlessly complex
and not fit for small teams or clusters.
One great part about Istio, however, is that it aims to bring as much value to users out of the box without any configuration at all.
This enables users to get most of the benefits of Istio with minimal efforts. For some users with simple requirements, custom configurations
may never be required at all. Others will be able to incrementally add Istio configurations once they are more comfortable and as they need them, such as to add
ingress routing, fine-tune networking settings, or lock down security policies.
Getting started
To get started, check out our getting started documentation, where you will learn how to install Istio.
If you are already familiar, you can simply run istioctl install.
Next, we will explore all the benefits Istio provides us, without any configuration or changes to application code.
Security
Istio automatically enables mutual TLS for traffic between pods in the mesh.
This enables applications to forgo complex TLS configuration and certificate management, and offload all transport layer security to the sidecar.
Once comfortable with automatic TLS, you may choose to allow only mTLS traffic, or configure custom authorization policies for your needs.
Observability
Istio automatically generates detailed telemetry for all service communications within a mesh.
This telemetry provides observability of service behavior, empowering operators to troubleshoot, maintain, and optimize their applications – without imposing any additional burdens on service developers.
Through Istio, operators gain a thorough understanding of how monitored services are interacting, both with other services and with the Istio components themselves.
All of this functionality is added by Istio without any configuration. Integrations with tools such as Prometheus, Grafana, Jaeger, Zipkin, and Kiali are also available.
For more information about the observability Istio provides, check out the observability overview.
Traffic Management
While Kubernetes provides a lot of networking functionality, such as service discovery and DNS, this is done at Layer 4, which can have unintended inefficiencies.
For example, in a simple HTTP application sending traffic to a service with 3 replicas, we can see unbalanced load:
$ curl http://echo/{0..5} -s | grep Hostname
Hostname=echo-cb96f8d94-2ssll
Hostname=echo-cb96f8d94-2ssll
Hostname=echo-cb96f8d94-2ssll
Hostname=echo-cb96f8d94-2ssll
Hostname=echo-cb96f8d94-2ssll
Hostname=echo-cb96f8d94-2ssll
$ curl http://echo/{0..5} -s | grep Hostname
Hostname=echo-cb96f8d94-879sn
Hostname=echo-cb96f8d94-879sn
Hostname=echo-cb96f8d94-879sn
Hostname=echo-cb96f8d94-879sn
Hostname=echo-cb96f8d94-879sn
Hostname=echo-cb96f8d94-879sn
The problem here is Kubernetes will determine the backend to send to when the connection is established, and all future requests on the same connection will be sent to the same backend.
In our example here, our first 5 requests are all sent to echo-cb96f8d94-2ssll, while our next set (using a new connection) are all sent to echo-cb96f8d94-879sn.
Our third instance never receives any requests.
With Istio, HTTP traffic (including HTTP/2 and gRPC) is automatically detected, and our services will automatically be load balanced per request, rather than per connection:
$ curl http://echo/{0..5} -s | grep Hostname
Hostname=echo-cb96f8d94-wf4xk
Hostname=echo-cb96f8d94-rpfqz
Hostname=echo-cb96f8d94-cgmxr
Hostname=echo-cb96f8d94-wf4xk
Hostname=echo-cb96f8d94-rpfqz
Hostname=echo-cb96f8d94-cgmxr
Here we can see our requests are round-robin load balanced between all backends.
In addition to these better defaults, Istio offers customization of a variety of traffic management settings, including timeouts, retries, and much more.



IstioCon 2021: Schedule Is Live!
Tue, 16 Feb 2021 00:00:00 +0000
IstioCon 2021 is a week-long, community-led, virtual conference starting on February 22.
This event provides an opportunity to hear the lessons learned from companies like Atlassian, Airbnb, FICO, eBay, T-Mobile and
Salesforce running Istio in production, hands-on experiences from the Istio community, and will feature maintainers from across
the Istio ecosystem.
You can now find the full schedule of events which includes a series of
English sessions and
Chinese sessions.

    
        
            
        
    
    

By attending the conference, you’ll connect with community members from across the globe. Each day you will find keynotes,
technical talks, lightning talks, panel discussions, workshops and roadmap sessions led by diverse speakers representing the
Istio community. You can also connect with other Istio and Open Source ecosystem community members through social hour events
that include activities on the social platform Gather.town, a live cartoonist,
virtual swag bags, raffles, live music and games.
Don’t miss it! Registration is free. We look forward to seeing you at the first IstioCon!



Better External Authorization
Tue, 09 Feb 2021 00:00:00 +0000
Background
Istio’s authorization policy provides access control for services in the mesh. It is fast, powerful and a widely used
feature. We have made continuous improvements to make policy more flexible since its first release in Istio 1.4, including
the DENY action, exclusion semantics,
X-Forwarded-For header support, nested JWT claim support
and more. These features improve the flexibility of the authorization policy, but there are still many use cases that
cannot be supported with this model, for example:


You have your own in-house authorization system that cannot be easily migrated to, or cannot be easily replaced by, the
authorization policy.


You want to integrate with a 3rd-party solution (e.g. Open Policy Agent
or oauth2 proxy) which may require use of the
low-level Envoy configuration APIs in Istio, or may not be possible
at all.


Authorization policy lacks necessary semantics for your use case.


Solution
In Istio 1.9, we have implemented extensibility into authorization policy by introducing a CUSTOM action,
which allows you to delegate the access control decision to an external authorization service.
The CUSTOM action allows you to integrate Istio with an external authorization system that implements its own custom
authorization logic. The following diagram shows the high level architecture of this integration:

    
        
            
        
    
    External Authorization Architecture

At configuration time, the mesh admin configures an authorization policy with a CUSTOM action to enable the
external authorization on a proxy (either gateway or sidecar). The admin should verify the external auth service is up
and running.
At runtime,


A request is intercepted by the proxy, and the proxy will send check requests to the external auth service, as
configured by the user in the authorization policy.


The external auth service will make the decision whether to allow it or not.


If allowed, the request will continue and will be enforced by any local authorization defined by ALLOW/DENY action.


If denied, the request will be rejected immediately.


Let’s look at an example authorization policy with the CUSTOM action:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: ext-authz
  namespace: istio-system
spec:
  # The selector applies to the ingress gateway in the istio-system namespace.
  selector:
    matchLabels:
      app: istio-ingressgateway
  # The action "CUSTOM" delegates the access control to an external authorizer, this is different from
  # the ALLOW/DENY action that enforces the access control right inside the proxy.
  action: CUSTOM
  # The provider specifies the name of the external authorizer defined in the meshconfig, which tells where and how to
  # talk to the external auth service. We will cover this more later.
  provider:
    name: "my-ext-authz-service"
  # The rule specifies that the access control is triggered only if the request path has the prefix "/admin/".
  # This allows you to easily enable or disable the external authorization based on the requests, avoiding the external
  # check request if it is not needed.
  rules:
  - to:
    - operation:
        paths: ["/admin/*"]
It refers to a provider called my-ext-authz-service which is defined in the mesh config:
extensionProviders:
# The name "my-ext-authz-service" is referred to by the authorization policy in its provider field.
- name: "my-ext-authz-service"
  # The "envoyExtAuthzGrpc" field specifies the type of the external authorization service is implemented by the Envoy
  # ext-authz filter gRPC API. The other supported type is the Envoy ext-authz filter HTTP API.
  # See more in https://www.envoyproxy.io/docs/envoy/v1.16.2/intro/arch_overview/security/ext_authz_filter.
  envoyExtAuthzGrpc:
    # The service and port specifies the address of the external auth service, "ext-authz.istio-system.svc.cluster.local"
    # means the service is deployed in the mesh. It can also be defined out of the mesh or even inside the pod as a separate
    # container.
    service: "ext-authz.istio-system.svc.cluster.local"
    port: 9000
The authorization policy of CUSTOM action
enables the external authorization in runtime, it could be configured to trigger the external authorization conditionally
based on the request using the same rule that you have already been using with other actions.
The external authorization service is currently defined in the meshconfig API
and referred to by its name. It could be deployed in the mesh with or without proxy. If with the proxy, you could
further use PeerAuthentication to enable mTLS between the proxy and your external authorization service.
The CUSTOM action is currently in the experimental stage; the API might change in a non-backward compatible way based on user feedback.
The authorization policy rules currently don’t support authentication fields (e.g. source principal or JWT claim) when used with the
CUSTOM action. Only one provider is allowed for a given workload, but you can still use different providers on different workloads.
For more information, please see the Better External Authorization design doc.
Example with OPA
In this section, we will demonstrate using the CUSTOM action with the Open Policy Agent as the external authorizer on
the ingress gateway. We will conditionally enable the external authorization on all paths except /ip.
You can also refer to the external authorization task for a more
basic introduction that uses a sample ext-authz server.
Create the example OPA policy
Run the following command create an OPA policy that allows the request if the prefix of the path is matched with the
claim “path” (base64 encoded) in the JWT token:
$ cat > policy.rego <

Deploy httpbin and OPA
Enable the sidecar injection:
$ kubectl label ns default istio-injection=enabled
Run the following command to deploy the example application httpbin and OPA. The OPA could be deployed either as a
separate container in the httpbin pod or completely in a separate pod:


    
    
                $ kubectl apply -f - <


                $ kubectl apply -f - <

Deploy the httpbin as well:
Zip$ kubectl apply -f @samples/httpbin/httpbin.yaml@



Define external authorizer
Run the following command to edit the meshconfig:
$ kubectl edit configmap istio -n istio-system
Add the following extensionProviders to the meshconfig:


    
    
                apiVersion: v1
data:
  mesh: |-
    # Add the following contents:
    extensionProviders:
    - name: "opa.local"
      envoyExtAuthzGrpc:
        service: "local-opa-grpc.local"
        port: "9191"

                apiVersion: v1
data:
  mesh: |-
    # Add the following contents:
    extensionProviders:
    - name: "opa.default"
      envoyExtAuthzGrpc:
        service: "opa.default.svc.cluster.local"
        port: "9191"



Create an AuthorizationPolicy with a CUSTOM action
Run the following command to create the authorization policy that enables the external authorization on all paths
except /ip:


    
    
                $ kubectl apply -f - <


                $ kubectl apply -f - <




Test the OPA policy


Create a client pod to send the request:
Zip$ kubectl apply -f @samples/sleep/sleep.yaml@
$ export SLEEP_POD=$(kubectl get pod -l app=sleep -o jsonpath={.items..metadata.name})


Use a test JWT token signed by the OPA:
$ export TOKEN_PATH_HEADERS="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwYXRoIjoiTDJobFlXUmxjbk09IiwibmJmIjoxNTAwMDAwMDAwLCJleHAiOjE5MDAwMDAwMDB9.9yl8LcZdq-5UpNLm0Hn0nnoBHXXAnK4e8RSl9vn6l98"
The test JWT token has the following claims:
{
  "path": "L2hlYWRlcnM=",
  "nbf": 1500000000,
  "exp": 1900000000
}
The path claim has value L2hlYWRlcnM= which is the base64 encode of /headers.


Send a request to path /headers without a token. This should be rejected with 403 because there is no JWT token:


    
    
                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin-with-opa:8000/headers -s -o /dev/null -w "%{http_code}\n"
403

                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin:8000/headers -s -o /dev/null -w "%{http_code}\n"
403





Send a request to path /get with a valid token. This should be rejected with 403 because the path /get is not matched with the token /headers:


    
    
                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin-with-opa:8000/get -H "Authorization: Bearer $TOKEN_PATH_HEADERS" -s -o /dev/null -w "%{http_code}\n"
403

                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin:8000/get -H "Authorization: Bearer $TOKEN_PATH_HEADERS" -s -o /dev/null -w "%{http_code}\n"
403





Send a request to path /headers with valid token. This should be allowed with 200 because the path is matched with the token:


    
    
                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin-with-opa:8000/headers -H "Authorization: Bearer $TOKEN_PATH_HEADERS" -s -o /dev/null -w "%{http_code}\n"
200

                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin:8000/headers -H "Authorization: Bearer $TOKEN_PATH_HEADERS" -s -o /dev/null -w "%{http_code}\n"
200





Send request to path /ip without token. This should be allowed with 200 because the path /ip is excluded from
authorization:


    
    
                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin-with-opa:8000/ip -s -o /dev/null -w "%{http_code}\n"
200

                $ kubectl exec ${SLEEP_POD} -c sleep  -- curl http://httpbin:8000/ip -s -o /dev/null -w "%{http_code}\n"
200





Check the proxy and OPA logs to confirm the result.


Summary
In Istio 1.9, the CUSTOM action in the authorization policy allows you to easily integrate Istio with any external
authorization system with the following benefits:


First-class support in the authorization policy API


Ease of usage: define the external authorizer simply with a URL and enable with the authorization policy, no more
hassle with the EnvoyFilter API


Conditional triggering,  allowing improved performance


Support for various deployment type of the external authorizer:


A normal service and pod with or without proxy


Inside the workload pod as a separate container


Outside the mesh




We’re working to promote this feature to a more stable stage in following versions and welcome your feedback at
discuss.istio.io.
Acknowledgements
Thanks to Craig Box, Christian Posta and Limin Wang for reviewing drafts of this blog.



Proxying legacy services using Istio egress gateways
Wed, 16 Dec 2020 00:00:00 +0000
At Deutsche Telekom Pan-Net, we have embraced Istio as the umbrella to cover our services. Unfortunately, there are services which have not yet been migrated to Kubernetes, or cannot be.
We can set Istio up as a proxy service for these upstream services. This allows us to benefit from capabilities like authorization/authentication, traceability and observability, even while legacy services stand as they are.
At the end of this article there is a hands-on exercise where you can simulate the scenario. In the exercise, an upstream service hosted at https://httpbin.org will be proxied by an Istio egress gateway.
If you are familiar with Istio, one of the methods offered to connect to upstream services is through an egress gateway.
You can deploy one to control all the upstream traffic or you can deploy multiple in order to have fine-grained control and satisfy the single-responsibility principle as this picture shows:

    
        
            
        
    
    Overview multiple Egress Gateways

With this model, one egress gateway is in charge of exactly one upstream service.
Although the Operator spec allows you to deploy multiple egress gateways, the manifest can become unmanageable:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
[...]
spec:
    egressGateways:
    - name: egressgateway-1
      enabled: true
    - name: egressgateway-2
      enabled: true
    [egressgateway-3, egressgateway-4, ...]
    - name: egressgateway-N
      enabled: true
[...]
As a benefit of decoupling egress getaways from the Operator manifest, you have enabled the possibility of setting up custom readiness probes to have both services (Gateway and upstream Service) aligned.
You can also inject OPA as a sidecar into the pod to perform authorization with complex rules (OPA envoy plugin).

    
        
            
        
    
    Authorization with OPA and `healthcheck` to external

As you can see, your possibilities increase and Istio becomes very extensible.
Let’s look at how you can implement this pattern.
Solution
There are several ways to perform this task, but here you will find how to define multiple Operators and deploy the generated resources.

    
        
            
        
        Yes! Istio 1.8.0 introduced the possibility to have fine-grained control over the objects that Operator deploys. This gives you the opportunity to patch them as you wish. Exactly what you need to proxy legacy services using Istio egress gateways.

        
    


In the following section you will  deploy an egress gateway to connect to an upstream service: httpbin (https://httpbin.org/)
At the end, you will have:

    
        
            
        
    
    Communication

Hands on
Prerequisites

kind (Kubernetes-in-Docker - perfect for local development)
istioctl

Kind

    
        
            
        
        If you use kind, do not forget to set up service-account-issuer and service-account-signing-key-file as described below. Otherwise, Istio may not install correctly.
    


Save this as config.yaml.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
  - |
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterConfiguration
    metadata:
      name: config
    apiServer:
      extraArgs:
        "service-account-issuer": "kubernetes.default.svc"
        "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"
$ kind create cluster --name  --config config.yaml
Where  is the name for the cluster.
Istio Operator with Istioctl
Install the Operator
$ istioctl operator init --watchedNamespaces=istio-operator
$ kubectl create ns istio-system
Save this as operator.yaml:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-operator
  namespace: istio-operator
spec:
  profile: default
  tag: 1.8.0
  meshConfig:
    accessLogFile: /dev/stdout
    outboundTrafficPolicy:
      mode: REGISTRY_ONLY

    
        
            
        
        outboundTrafficPolicy.mode: REGISTRY_ONLY is used to block all external communications which are not specified by a ServiceEntry resource.
    


$ kubectl apply -f operator.yaml
Deploy Egress Gateway
The steps for this task assume:

The service is installed under the namespace: httpbin.
The service name is: http-egress.

Istio 1.8 introduced the possibility to apply overlay configuration, to give fine-grain control over the created resources.
Save this as egress.yaml:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: empty
  tag: 1.8.0
  namespace: httpbin
  components:
    egressGateways:
    - name: httpbin-egress
      enabled: true
      label:
        app: istio-egressgateway
        istio: egressgateway
        custom-egress: httpbin-egress
      k8s:
        overlays:
        - kind: Deployment
          name: httpbin-egress
          patches:
          - path: spec.template.spec.containers[0].readinessProbe
            value:
              failureThreshold: 30
              exec:
                command:
                  - /bin/sh
                  - -c
                  - curl http://localhost:15021/healthz/ready && curl https://httpbin.org/status/200
              initialDelaySeconds: 1
              periodSeconds: 2
              successThreshold: 1
              timeoutSeconds: 1
  values:
    gateways:
      istio-egressgateway:
        runAsRoot: true

    
        
            
        
        Notice the block under overlays. You are patching the default egressgateway to deploy only that component with the new readinessProbe.
    


Create the namespace where you will install the egress gateway:
$ kubectl create ns httpbin
As it is described in the documentation, you can deploy several Operator resources. However, they have to be pre-parsed and then applied to the cluster.
$ istioctl manifest generate -f egress.yaml | kubectl apply -f -
Istio configuration
Now you will configure Istio to allow connections to the upstream service at https://httpbin.org.
Certificate for TLS
You need a certificate to make a secure connection from outside the cluster to your egress service.
How to generate a certificate is explained in the Istio ingress documentation.
Create and apply one to be used at the end of this article to access the service from outside the cluster ():
$ kubectl create -n istio-system secret tls  --key= --cert=
Where  is the name used later for the Gateway resource.  and  are the files for the certificate. .

    
        
            
        
        You need to remember ,  and  because you will use them later in the article.
    


Ingress Gateway
Create a Gateway resource to operate ingress gateway to accept requests.

    
        
            
        
        Make sure that only one Gateway spec matches the hostname. Istio gets confused when there are multiple Gateway definitions covering the same hostname.
    


An example:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-ingressgateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - ""
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
     httpsRedirect: true
  - port:
      number: 443
      name: https
      protocol: https
    hosts:
    - ""
    tls:
      mode: SIMPLE
      credentialName: 
Where  is the hostname to access the service through the my-ingressgateway and  is the secret which contains the certificate.
Egress Gateway
Create another Gateway object, but this time to operate the egress gateway you have already installed:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: "httpbin-egress"
  namespace: "httpbin"
spec:
  selector:
    istio: egressgateway
    service.istio.io/canonical-name: "httpbin-egress"
  servers:
  - hosts:
    - ""
    port:
      number: 80
      name: http
      protocol: HTTP
Where  is the hostname to access through the my-ingressgateway.
Virtual Service
Create a VirtualService for three use cases:

Mesh gateway for service-to-service communications within the mesh
Ingress Gateway for the communication from outside the mesh
Egress Gateway for the communication to the upstream service


    
        
            
        
        Mesh and Ingress Gateway will share the same specification. It will redirect the traffic to your egress gateway service.
    


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: "httpbin-egress"
  namespace: "httpbin"
spec:
  hosts:
  - ""
  gateways:
  - mesh
  - "istio-system/my-ingressgateway"
  - "httpbin/httpbin-egress"
  http:
  - match:
    - gateways:
      - "istio-system/my-ingressgateway"
      - mesh
      uri:
        prefix: "/"
    route:
    - destination:
        host: "httpbin-egress.httpbin.svc.cluster.local"
        port:
          number: 80
  - match:
    - gateways:
      - "httpbin/httpbin-egress"
      uri:
        prefix: "/"
    route:
    - destination:
        host: "httpbin.org"
        subset: "http-egress-subset"
        port:
          number: 443
Where  is the hostname to access through the my-ingressgateway.
Service Entry
Create a ServiceEntry to allow the communication to the upstream service:

    
        
            
        
        Notice that the port is configured for TLS protocol
    


apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: "httpbin-egress"
  namespace: "httpbin"
spec:
  hosts:
  - "httpbin.org"
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: DNS
Destination Rule
Create a DestinationRule to allow TLS origination for egress traffic as explained in the documentation
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: "httpbin-egress"
  namespace: "httpbin"
spec:
  host: "httpbin.org"
  subsets:
  - name: "http-egress-subset"
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
      portLevelSettings:
      - port:
          number: 443
        tls:
          mode: SIMPLE
Peer Authentication
To secure the service-to-service, you need to enforce mTLS:
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "httpbin-egress"
  namespace: "httpbin"
spec:
  mtls:
    mode: STRICT
Test
Verify that your objects were all specified correctly:
$ istioctl analyze --all-namespaces
External access
Test the egress gateway from outside the cluster forwarding the ingressgateway service’s port and calling the service
$ kubectl -n istio-system port-forward svc/istio-ingressgateway 15443:443
$ curl -vvv -k -HHost: --resolve ":15443:127.0.0.1" --cacert  "https://:15443/status/200"
Where  is the hostname to access through the my-ingressgateway and  is the certificate defined for the ingressgateway object. This is due to tls.mode: SIMPLE which does not terminate TLS
Service-to-service access
Test the egress gateway from inside the cluster deploying the sleep service. This is useful when you design failover.
$ kubectl label namespace httpbin istio-injection=enabled --overwrite
$ kubectl apply -n httpbin -f  https://raw.githubusercontent.com/istio/istio/release-1.29/samples/sleep/sleep.yaml
$ kubectl -n httpbin "$(kubectl get pod -n httpbin -l app=sleep -o jsonpath={.items..metadata.name})" -- curl -vvv http:///status/200
Where  is the hostname to access through the my-ingressgateway.

    
        
            
        
        Notice that http (and not https) is the protocol used for service-to-service communication. This is due to Istio handling the TLS itself. Developers do not care anymore about certificates management. Fancy!
    



    
        
            
        
        Eat, Sleep, Rave, REPEAT!

        
    


Now it is time to create a second, third and fourth egress gateway pointing to other upstream services.
Final thoughts

    
        
            
        
        Is the juice worth the squeeze?

        
    


Istio might seem complex to configure. But it is definitely worthwhile, due to the huge set of benefits it brings to your services (with an extra Olé! for Kiali).
The way Istio is developed allows us, with minimal effort, to satisfy uncommon requirements like the one presented in this article.
To finish, I just wanted to point out that Istio, as a good cloud native technology, does not require a large team to maintain. For example, our current team is composed of 3 engineers.
To discuss more about Istio and its possibilities, please contact one of us:

Antonio Berben
Piotr Ciążyński
Kristián Patlevič




Proxy protocol on AWS NLB and Istio ingress gateway
Fri, 11 Dec 2020 00:00:00 +0000
This blog presents my latest experience about how to configure and enable proxy protocol with stack of AWS NLB and Istio Ingress gateway. The Proxy Protocol was designed to chain proxies and reverse-proxies without losing the client information. The proxy protocol prevents the need for infrastructure changes or NATing firewalls, and offers the benefits of being protocol agnostic and providing good scalability. Additionally, we also enable the X-Forwarded-For HTTP header in the deployment to make the client IP address easy to read. In this blog, traffic management of Istio ingress is shown with an httpbin service on ports 80 and 443 to demonstrate the use of proxy protocol. Note that both v1 and v2 of the proxy protocol work for the purpose of this example, but because the AWS NLB currently only supports v2, proxy protocol v2 is used in the rest of this blog by default. The following image shows the use of proxy protocol v2 with an AWS NLB.

    
        
            
        
        A receiver may be configured to support both version 1 and version 2 of the
protocol. Identifying the protocol version is easy:


If the incoming byte count is 16 or more and the first 13 bytes match the protocol signature block \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02, the protocol is version 2.


Otherwise, if the incoming byte count is 8 or more, and the 5 first characters match the US-ASCII representation of “PROXY”(\x50\x52\x4F\x58\x59), then the protocol must be parsed as version 1.


Otherwise the protocol is not covered by this specification and the connection must be dropped.



    



    
        
            
        
    
    AWS NLB portal to enable proxy protocol

Separate setups for 80 and 443
Before going through the following steps, an AWS environment that is configured with the proper VPC, IAM, and Kubernetes setup is assumed.
Step 1: Install Istio with AWS NLB
The blog Configuring Istio Ingress with AWS NLB provides detailed steps to set up AWS IAM roles and enable the usage of AWS NLB by Helm. You can also use other automation tools, such as Terraform, to achieve the same goal. In the following example, more complete configurations are shown in order to enable proxy protocol and X-Forwarded-For at the same time.
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    proxy.istio.io/config: '{"gatewayTopology" : { "numTrustedProxies": 2 } }'
  labels:
    app: istio-ingressgateway
    istio: ingressgateway
    release: istio
  name: istio-ingressgateway
Step 2: Create proxy-protocol Envoy Filter
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: proxy-protocol
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
  - applyTo: LISTENER
    patch:
      operation: MERGE
      value:
        listener_filters:
        - name: envoy.filters.listener.proxy_protocol
        - name: envoy.filters.listener.tls_inspector
Step 3: Enable X-Forwarded-For header
This blog includes several samples of configuring Gateway Network Topology. In the following example, the configurations are tuned to enable X-Forwarded-For without any middle proxy.
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: ingressgateway-settings
  namespace: istio-system
spec:
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      listener:
        filterChain:
          filter:
            name: envoy.http_connection_manager
    patch:
      operation: MERGE
      value:
        name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          skip_xff_append: false
          use_remote_address: true
          xff_num_trusted_hops: 1
Step 4: Deploy ingress gateway for httpbin on port 80 and 443

    
        
            
        
        When following the secure ingress setup, macOS users must add an additional patch to generate certificates for TLS.
    


apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "a25fa0b4835b.elb.us-west-2.amazonaws.com"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - "a25fa0b4835b.elb.us-west-2.amazonaws.com"
  gateways:
  - httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /headers
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: mygateway2
spec:
  selector:
    istio: ingressgateway # use istio default ingress gateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: httpbin-credential # must be the same as secret
    hosts:
    - "a25fa0b4835b.elb.us-west-2.amazonaws.com"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - "a25fa0b4835b.elb.us-west-2.amazonaws.com"
  gateways:
  - mygateway2
  http:
  - match:
    - uri:
        prefix: /headers
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
Step 5: Check header output of httpbin
Check port 443 (80 will be similar) and compare the cases with and without proxy protocol.
//////with proxy_protocal enabled in the stack
*   Trying YY.XXX.141.26...
* TCP_NODELAY set
* Connection failed
* connect to YY.XXX.141.26 port 443 failed: Operation timed out
*   Trying YY.XXX.205.117...
* TCP_NODELAY set
* Connected to a25fa0b4835b.elb.us-west-2.amazonaws.com (XX.YYY.205.117) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: new_certificates/example.com.crt
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=a25fa0b4835b.elb.us-west-2.amazonaws.com; O=httpbin organization
*  start date: Oct 29 20:39:12 2020 GMT
*  expire date: Oct 29 20:39:12 2021 GMT
*  common name: a25fa0b4835b.elb.us-west-2.amazonaws.com (matched)
*  issuer: O=example Inc.; CN=example.com
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fc6c8810800)
> GET /headers?show_env=1 HTTP/2
> Host: a25fa0b4835b.elb.us-west-2.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)!
< HTTP/2 200
< server: istio-envoy
< date: Thu, 29 Oct 2020 21:39:46 GMT
< content-type: application/json
< content-length: 629
< access-control-allow-origin: *
< access-control-allow-credentials: true
< x-envoy-upstream-service-time: 2
<
{
  "headers": {
    "Accept": "*/*",
    "Content-Length": "0",
    "Host": "a25fa0b4835b.elb.us-west-2.amazonaws.com",
    "User-Agent": "curl/7.64.1",
    "X-B3-Sampled": "0",
    "X-B3-Spanid": "74f99a1c6fc29975",
    "X-B3-Traceid": "85db86fe6aa322a074f99a1c6fc29975",
    "X-Envoy-Attempt-Count": "1",
    "X-Envoy-Decorator-Operation": "httpbin.default.svc.cluster.local:8000/headers*",
    "X-Envoy-External-Address": "XX.110.54.41",
    "X-Forwarded-For": "XX.110.54.41",
    "X-Forwarded-Proto": "https",
    "X-Request-Id": "5c3bc236-0c49-4401-b2fd-2dbfbce506fc"
  }
}
* Connection #0 to host a25fa0b4835b.elb.us-west-2.amazonaws.com left intact
* Closing connection 0
//////////without proxy_protocal
*   Trying YY.XXX.141.26...
* TCP_NODELAY set
* Connection failed
* connect to YY.XXX.141.26 port 443 failed: Operation timed out
*   Trying YY.XXX.205.117...
* TCP_NODELAY set
* Connected to a25fa0b4835b.elb.us-west-2.amazonaws.com (YY.XXX.205.117) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: new_certificates/example.com.crt
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=a25fa0b4835b.elb.us-west-2.amazonaws.com; O=httpbin organization
*  start date: Oct 29 20:39:12 2020 GMT
*  expire date: Oct 29 20:39:12 2021 GMT
*  common name: a25fa0b4835b.elb.us-west-2.amazonaws.com (matched)
*  issuer: O=example Inc.; CN=example.com
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fbf8c808200)
> GET /headers?show_env=1 HTTP/2
> Host: a25fa0b4835b.elb.us-west-2.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)!
< HTTP/2 200
< server: istio-envoy
< date: Thu, 29 Oct 2020 20:44:01 GMT
< content-type: application/json
< content-length: 612
< access-control-allow-origin: *
< access-control-allow-credentials: true
< x-envoy-upstream-service-time: 1
<
{
  "headers": {
    "Accept": "*/*",
    "Content-Length": "0",
    "Host": "a25fa0b4835b.elb.us-west-2.amazonaws.com",
    "User-Agent": "curl/7.64.1",
    "X-B3-Sampled": "0",
    "X-B3-Spanid": "69913a6e6e949334",
    "X-B3-Traceid": "729d5da3618545da69913a6e6e949334",
    "X-Envoy-Attempt-Count": "1",
    "X-Envoy-Decorator-Operation": "httpbin.default.svc.cluster.local:8000/headers*",
    "X-Envoy-Internal": "true",
    "X-Forwarded-For": "172.16.5.30",
    "X-Forwarded-Proto": "https",
    "X-Request-Id": "299c7f8a-5f89-480a-82c9-028c76d45d84"
  }
}
* Connection #0 to host a25fa0b4835b.elb.us-west-2.amazonaws.com left intact
* Closing connection 0
Conclusion
This blog presents the deployment of a stack that consists of an AWS NLB and Istio ingress gateway that are enabled with proxy-protocol. We hope it is useful to you if you are interested in protocol enabling in an anecdotal, experiential, and more informal way. However, note that the X-Forwarded-For header should be used only for the convenience of reading in test, as dealing with fake X-Forwarded-For attacks is not within the scope of this blog.
References


protocol settings


protocol introduction





Join us for the first IstioCon in 2021!
Tue, 08 Dec 2020 00:00:00 +0000
IstioCon 2021 will be the inaugural conference for Istio, the industry’s most popular service mesh. In its inaugural year, IstioCon will be 100% virtual, connecting community members across the globe with Istio’s ecosystem. This conference will take place at the end of February.

    
        
            
        
    
    

All the information related to IstioCon will be published on the conference website. IstioCon provides an opportunity to showcase the lessons learned from running Istio in production, hands-on experiences from the Istio community, and will feature maintainers from across the Istio ecosystem. At this time, we encourage Istio users, developers, partners, and advocates to submit a session proposal through the conference’s CFP portal. The conference offers a mix of keynotes, technical talks, lightning talks, workshops, and roadmap sessions. Choose from the following formats to submit a session proposal for IstioCon:

Presentation: 40 minute presentation, maximum of 2 speakers
Panel: 40 minutes of discussion among 3 to 5 speakers
Workshop: 160 minute (2h 40m), in-depth, hands-on presentation with 1–4 speakers
Lighting Talk: 10 minute presentation, limited to 1 speaker

This community-led event also has in store two social hours to take the load off and mesh with the Istio community, vendors, and maintainers. Participation in the event is free of charge, and will only require participants to register in order to join.
Stay tuned to hear more about this conference, and we hope you can join us at the first IstioCon in 2021!



Handling Docker Hub rate limiting
Mon, 07 Dec 2020 00:00:00 +0000
Since November 20th, 2020, Docker Hub has introduced rate limits on image pulls.
Because Istio uses Docker Hub as the default registry, usage on a large cluster may lead
to pods failing to startup due to exceeding rate limits. This can be especially problematic for Istio, as there is typically
the Istio sidecar image alongside most pods in the cluster.
Mitigations
Istio allows you to specify a custom docker registry which you can use to make container images be fetched from your private registry. This can be configured by passing --set hub= at installation time.
Istio provides official mirrors to Google Container Registry. This can be configured with --set hub=gcr.io/istio-release. This is available for Istio 1.5+.
Alternatively, you can copy the official Istio images to your own registry. This is especially useful if your cluster runs in an environment with a registry tailored for your use case (for example, on AWS you may want to mirror images to Amazon ECR) or you have air gapped security requirements where access to public registries is restricted. This can be done with the following script:
$ SOURCE_HUB=istio
$ DEST_HUB=my-registry # Replace this with the destination hub
$ IMAGES=( install-cni operator pilot proxyv2 ) # Images to mirror.
$ VERSIONS=( 1.7.5 1.8.0 ) # Versions to copy
$ VARIANTS=( "" "-distroless" ) # Variants to copy
$ for image in $IMAGES; do
$ for version in $VERSIONS; do
$ for variant in $VARIANTS; do
$   name=$image:$version$variant
$   docker pull $SOURCE_HUB/$name
$   docker tag $SOURCE_HUB/$name $DEST_HUB/$name
$   docker push $DEST_HUB/$name
$   docker rmi $SOURCE_HUB/$name
$   docker rmi $DEST_HUB/$name
$ done
$ done
$ done



Expanding into New Frontiers - Smart DNS Proxying in Istio
Thu, 12 Nov 2020 00:00:00 +0000
DNS resolution is a vital component of any application infrastructure
on Kubernetes. When your application code attempts to access another
service in the Kubernetes cluster or even a service on the internet,
it has to first lookup the IP address corresponding to the hostname of
the service, before initiating a connection to the service. This name
lookup process is often referred to as service discovery. In
Kubernetes, the cluster DNS server, be it kube-dns or CoreDNS,
resolves the service’s hostname to a unique non-routable virtual IP (VIP),
if it is a service of type clusterIP. The kube-proxy on each node
maps this VIP to a set of pods of the service, and forwards the traffic
to one of them selected at random. When using a service mesh, the
sidecar works similarly to the kube-proxy as far as traffic forwarding
is concerned.
The following diagram depicts the role of DNS today:

    
        
            
        
    
    Role of DNS in Istio, today

Problems posed by DNS
While the role of DNS within the service mesh may seem insignificant,
it has consistently stood in the way of expanding the mesh to VMs and
enabling seamless multicluster access.
VM access to Kubernetes services
Consider the case of a VM with a sidecar. As shown in the illustration
below, applications on the VM look up the IP addresses of services
inside the Kubernetes cluster as they typically have no access to the
cluster’s DNS server.

    
        
            
        
    
    DNS resolution issues on VMs accessing Kubernetes services

It is technically possible to use kube-dns as a name server on the VM if one is
willing to engage in some convoluted workarounds involving dnsmasq and
external exposure of kube-dns using NodePort services: assuming you
manage to convince your cluster administrator to do so. Even so, you are
opening the door to a host of security
issues. At
the end of the day, these are point solutions that are typically out
of scope for those with limited organizational capability and domain
expertise.
External TCP services without VIPs
It is not just the VMs in the mesh that suffer from the DNS issue. For
the sidecar to accurately distinguish traffic between two different
TCP services that are outside the mesh, the services must be on
different ports or they need to have a globally unique VIP, much like
the clusterIP assigned to Kubernetes services. But what if there is
no VIP? Cloud hosted services like hosted databases, typically do not
have a VIP. Instead, the provider’s DNS server returns one of the
instance IPs that can then be directly accessed by the
application. For example, consider the two service entries below,
pointing to two different AWS RDS services:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: db1
  namespace: ns1
spec:
  hosts:
  - mysql-instance1.us-east-1.rds.amazonaws.com
  ports:
  - name: mysql
    number: 3306
    protocol: TCP
  resolution: DNS
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: db2
  namespace: ns1
spec:
  hosts:
  - mysql-instance2.us-east-1.rds.amazonaws.com
  ports:
  - name: mysql
    number: 3306
    protocol: TCP
  resolution: DNS
The sidecar has a single listener on 0.0.0.0:3306 that looks up the
IP address of mysql-instance1.us-east1.rds.amazonaws.com from public
DNS servers and forwards traffic to it. It cannot route traffic to
db2 as it has no way of distinguishing whether traffic arriving at
0.0.0.0:3306 is bound for db1 or db2. The only way to accomplish
this is to set the resolution to NONE causing the sidecar to
blindly forward any traffic on port 3306 to the original IP
requested by the application. This is akin to punching a hole in the
firewall allowing all traffic to port 3306 irrespective of the
destination IP. To get traffic flowing, you are now forced to
compromise on the security posture of your system.
Resolving DNS for services in remote clusters
The DNS limitations of a multicluster mesh are well known. Services in
one cluster cannot lookup the IP addresses of services in other
clusters, without clunky workarounds such as creating stub services in
the caller namespace.
Taking control of DNS
All in all, DNS has been a thorny issue in Istio for a while. It was
time to slay the beast. We (the Istio networking team) decided to
tackle the problem once and for all in a way that is completely
transparent to you, the end user. Our first attempt involved utilizing
Envoy’s DNS proxy. It turned out to be very unreliable, and
disappointing overall due to the general lack of sophistication in
the c-ares DNS library used by Envoy. Determined to solve the
problem, we decided to implement the DNS proxy in the Istio sidecar
agent, written in Go. We were able to optimize the implementation to
handle all the scenarios that we wanted to tackle without compromising
on scale and stability. The Go DNS library we use is the same one
used by scalable DNS implementations such as CoreDNS, Consul,
Mesos, etc. It has been battle tested in production for scale and stability.
Starting with Istio 1.8, the Istio agent on the sidecar will ship with
a caching DNS proxy, programmed dynamically by Istiod. Istiod pushes
the hostname-to-IP-address mappings for all the services that the
application may access based on the Kubernetes services and service
entries in the cluster. DNS lookup queries from the application are
transparently intercepted and served by the Istio agent in the pod or
VM. If the query is for a service within the mesh, irrespective of
the cluster that the service is in, the agent responds directly to the
application. If not, it forwards the query to the upstream name
servers defined in /etc/resolv.conf. The following diagram depicts
the interactions that occur when an application tries to access a
service using its hostname.

    
        
            
        
    
    Smart DNS proxying in Istio sidecar agent

As you will see in the following sections, the DNS proxying feature
has had an enormous impact across many aspects of Istio.
Reduced load on your DNS servers w/ faster resolution
The load on your cluster’s Kubernetes DNS server drops drastically as
almost all DNS queries are resolved within the pod by Istio. The
bigger the footprint of mesh on a cluster, the lesser the load on your
DNS servers. Implementing our own DNS proxy in the Istio agent has
allowed us to implement cool optimizations such as CoreDNS
auto-path without the
correctness issues that CoreDNS currently faces.
To understand the impact of this optimization, lets take a simple DNS
lookup scenario, in a standard Kubernetes cluster without any custom
DNS setup for pods - i.e., with the default setting of ndots:5 in /etc/resolv.conf.
When your application starts a DNS lookup for
productpage.ns1.svc.cluster.local, it appends the DNS search
namespaces in /etc/resolv.conf (e.g., ns1.svc.cluster.local) as part
of the DNS query, before querying the host as-is. As a result, the
first DNS query that is actually sent out will look like
productpage.ns1.svc.cluster.local.ns1.svc.cluster.local, which will
inevitably fail DNS resolution when Istio is not involved. If your
/etc/resolv.conf has 5 search namespaces, the application will send
two DNS queries for each search namespace, one for the IPv4 A record
and another for the IPv6 AAAA record, and then a final pair of
queries with the exact hostname used in the code. Before establishing the
connection, the application performs 12 DNS lookup queries for each host!
With Istio’s implementation of the CoreDNS style auto-path technique,
the sidecar agent will detect the real hostname being queried within
the first query and return a cname record to
productpage.ns1.svc.cluster.local as part of this DNS response, as
well as the A/AAAA record for
productpage.ns1.svc.cluster.local. The application receiving this
response can now extract the IP address immediately and proceed to
establishing a TCP connection to that IP. The smart DNS proxy in the
Istio agent dramatically cuts down the number of DNS queries from 12
to just 2!
VMs to Kubernetes integration
Since the Istio agent performs local DNS resolution for services
within the mesh, DNS lookup queries for Kubernetes services from VMs will now
succeed without requiring clunky workarounds for exposing kube-dns
outside the cluster. The ability to seamlessly resolve internal
services in a cluster will now simplify your monolith to microservice
journey, as the monolith on VMs can now access microservices on
Kubernetes without additional levels of indirection via API gateways.
Automatic VIP allocation where possible
You may ask, how does this DNS functionality in the agent solve the
problem of distinguishing between multiple external TCP services
without VIPs on the same port?
Taking inspiration from Kubernetes, Istio will now automatically
allocate non-routable VIPs (from the Class E subnet) to such services
as long as they do not use a wildcard host. The Istio agent on the
sidecar will use the VIPs as responses to the DNS lookup queries from
the application. Envoy can now clearly distinguish traffic bound for
each external TCP service and forward it to the right target. With the
introduction of the DNS proxying, you will no longer need to use
resolution: NONE for non-wildcard TCP services, improving your
overall security posture. Istio cannot help much with wildcard
external services (e.g., *.us-east1.rds.amazonaws.com). You will
have to resort to NONE resolution mode to handle such services.
Multicluster DNS lookup
For the adventurous lot, attempting to weave a multicluster mesh where
applications directly call internal services of a namespace in a
remote cluster, the DNS proxy functionality comes in quite handy. Your
applications can resolve Kubernetes services on any cluster in any
namespace, without the need to create stub Kubernetes services in
every cluster.
The benefits of the DNS proxy extend beyond the multicluster models
that are currently described in Istio today.  At Tetrate, we use this
mechanism extensively in our customers’ multicluster deployments to
enable sidecars to resolve DNS for hosts exposed at ingress gateways
of all the clusters in a mesh, and access them over mutual TLS.
Concluding thoughts
The problems caused by lack of control over DNS have often been
overlooked and ignored in its entirety when it comes to weaving a mesh
across many clusters, different environments, and integrating external
services. The introduction of a caching DNS proxy in the Istio sidecar
agent solves these issues. Exercising control over the
application’s DNS resolution allows Istio to accurately identify the
target service to which traffic is bound, and enhance the overall
security, routing, and telemetry posture in Istio within and across
clusters.
Smart DNS proxying is enabled in the preview
profile in Istio 1.8. Please try it out!



2020 Steering Committee Election Results
Tue, 29 Sep 2020 00:00:00 +0000
Last month, we announced a revision to our Steering Committee charter, opening up governance roles to more contributors and community members. The Steering Committee now consists of 9 proportionally-allocated Contribution Seats, and 4 elected Community Seats.
We have now concluded our inaugural election for the Community Seats, and we’re excited to welcome the following new members to the Committee:

Neeraj Poddar (Aspen Mesh)
Zack Butcher (Tetrate)
Christian Posta (Solo.io)
Zhonghu Xu (Huawei)

They join Contribution Seat holders from Google, IBM/Red Hat and Salesforce. We now have representation from 7 organizations on Steering, reflecting the breadth of our contributor ecosystem.
Thank you to everyone who participated in the election process. The next election will be in July 2021.



Large Scale Security Policy Performance Tests
Tue, 15 Sep 2020 00:00:00 +0000
Overview
Istio has a wide range of security policies which can be easily configured into systems of services. As the number of applied policies increases, it is important to understand the relationship of latency, memory usage, and CPU usage of the system.
This blog post goes over common security policies use cases and how the number of security policies or the number of specific rules in a security policy can affect the overall latency of requests.
Setup
There are a wide range of security policies and many more combinations of those policies. We will go over 6 of the most commonly used test cases.
The following test cases are run in an environment which consists of a Fortio client sending requests to a Fortio server, with a baseline of no Envoy sidecars deployed. The following data was gathered by using the Istio performance benchmarking tool.

    
        
            
        
    
    

In these test cases, requests either do not match any rules or match only the very last rule in the security policies. This ensures that the RBAC filter is applied to all policy rules, and never matches a policy rule before before viewing all the policies. Even though this is not necessarily what will happen in your own system, this policy setup provides data for the worst possible performance of each test case.
Test cases


Mutual TLS STRICT vs plaintext.


A single authorization policy with a variable number of principal rules as well as a PeerAuthentication policy. The principal rule is dependent on the PeerAuthentication policy being applied to the system.


A single authorization policy with a variable number of requestPrincipal rules as well as a RequestAuthentication policy. The requestPrincipal is dependent on the RequestAuthentication policy being applied to the system.


A single authorization policy with a variable number of paths vs sourceIP rules.


A variable number of authorization policies consisting of a single path or sourceIP rule.


A single RequestAuthentication policy with variable number of JWTRules rules.


Data
The y-axis of each test is the latency in milliseconds, and the x-axis is the number of concurrent connections. The x-axis of each graph consists of 3 data points that represent a small load (qps=100, conn=8), medium load (qps=500, conn=32), and large load (qps=1000, conn=64).


    
    
                
    
        
            
        
    
    

The difference of latency between MTLS mode STRICT and plaintext is very small in lower loads. As the `qps` and `conn` increase, the latency of requests with MTLS STRICT increases. The additional latency increased in larger loads is minimal compared to that of the increase from having no sidecars to having sidecars in the plaintext.

                
    
        
            
        
    
    

For Authorization policies with 10 vs 1000 principal rules, the latency increase of 10 principal rules compared to no policies is greater than the latency increase of 1000 principals compared to 10 principals.

                
    
        
            
        
    
    

For Authorization policies with a variable number of `requestPrincipal` rules, the latency increase of 10 `requestPrincipal` rules compared to no policies is nearly the same as the latency increase of 1000 `requestPrincipal` rules compared to 10 `requestPrincipal` rules.

                
    
        
            
        
    
    

The latency increase of a single `AuthZ` policy with 10 `sourceIP` rules is not proportional to the latency increase of a single `AuthZ` policy with 1000 `sourceIP` rules compared to the system with sidecar and no policies.

    
        
            
        
    
    

The latency increase of a variable number of `sourceIP` rules is marginally greater than that of path rules.

                
    
        
            
        
    
    

The latency increase of a single `AuthZ` policy with 10 path rules is not proportional to the latency increase of a single `AuthZ` policy with 1000 path rules compared to the system with sidecar and no policies. This trend is similar to that of `sourceIP` rules.

    
        
            
        
    
    

The latency of a variable number of paths rules is marginally lesser than that of `sourceIP` rules.

                
    
        
            
        
    
    

The latency of a single JWT issuer is comparable to that of no policies, but as the number of JWT issuers increase, the latency increases disproportionately.

                To test how the number of Authorization policies affect runtime, the tests can be broken into two cases:


Every Authorization policy has a single sourceIP rule.


Every Authorization policy has a single path rule.



    
        
            
        
    
    


    
        
            
        
    
    

The overall trends of both graphs are similar. This is consistent to the paths vs `sourceIP` data, which showed that the latency is marginally greater for `sourceIP` rules than that of path rules.



Conclusion


In general, adding security policies does not add relatively high overhead to the system. The policies that add the most latency include:


Authorization policy with JWTRules rules.


Authorization policy with requestPrincipal rules.


Authorization policy with principals rules.




In lower loads (requests with lower qps and conn) the difference in latency for most policies is minimal.


Envoy proxy sidecars increase latency more than most policies, even if the policies are large.


The latency increase of extremely large policies is relatively similar to the latency increase of adding Envoy proxy sidecars compared to that of no sidecars.


Two different tests determined that the sourceIP rule is marginally slower than a path rule.


If you are interested in creating your own large scale security policies and running performance tests with them, see the performance benchmarking tool README.
If you are interested in reading more about the security policies tests, see our design doc. If you don’t already have access, you can join the Istio team drive.



Deploying Istio Control Planes Outside the Mesh
Thu, 27 Aug 2020 00:00:00 +0000
Overview
From experience working with various service mesh users and vendors, we believe there are 3 key personas for a typical service mesh:


Mesh Operator, who manages the service mesh control plane installation and upgrade.


Mesh Admin, often referred as Platform Owner, who owns the service mesh platform and defines the overall strategy and implementation for service owners to adopt service mesh.


Mesh User, often referred as Service Owner, who owns one or more services in the mesh.


Prior to version 1.7, Istio required the control plane to run in one of the primary clusters in the mesh, leading to a lack of separation between the mesh operator and the mesh admin. Istio 1.7 introduces a new external control plane deployment model which enables mesh operators to install and manage mesh control planes on separate external clusters. This deployment model allows a clear separation between mesh operators and mesh admins. Istio mesh operators can now run Istio control planes for mesh admins while mesh admins can still control the configuration of the control plane without worrying about installing or managing the control plane. This model is transparent to mesh users.
External control plane deployment model
After installing Istio using the default installation profile, you will have an Istiod control plane installed in a single cluster like the diagram below:

    
        
            
        
    
    Istio mesh in a single cluster

With the new deployment model in Istio 1.7, it’s possible to run Istiod on an external cluster, separate from the mesh services as shown in the diagram below. The external control plane cluster is owned by the mesh operator while the mesh admin owns the cluster running services deployed in the mesh. The mesh admin has no access to the external control plane cluster. Mesh operators can follow the external istiod single cluster step by step guide to explore more on this. (Note: In some internal discussions among Istio maintainers, this model was previously referred to as “central istiod”.)

    
        
            
        
    
    Single cluster Istio mesh with Istiod in an external control plane cluster

Mesh admins can expand the service mesh to multiple clusters, which are managed by the same Istiod running in the external cluster. None of the mesh clusters are primary clusters, in this case. They are all remote clusters. However, one of them also serves as the Istio configuration cluster, in addition to running services. The external control plane reads Istio configurations from the config cluster and Istiod pushes configuration to the data plane running in both the config cluster and other remote clusters as shown in the diagram below.

    
        
            
        
    
    Multicluster Istio mesh with Istiod in an external control plane cluster

Mesh operators can further expand this deployment model to manage multiple Istio control planes from an external cluster running multiple Istiod control planes:

    
        
            
        
    
    Multiple single clusters with multiple Istiod control planes in an external control plane cluster

In this case, each Istiod manages its own remote cluster(s). Mesh operators can even install their own Istio mesh in the external control plane cluster and configure its istio-ingress gateway to route traffic from remote clusters to their corresponding Istiod control planes. To learn more about this, check out these steps.
Conclusion
The external control plane deployment model enables the Istio control plane to be run and managed by mesh operators who have operational expertise in Istio, and provides a clean separation between service mesh control and data planes. Mesh operators can run the control plane in their own clusters or other environments, providing the control plane as a service to mesh admins. Mesh operators can run multiple Istiod control planes in a single cluster, deploying their own Istio mesh and using istio-ingress gateways to control access to these Istiod control planes. Through the examples provided here, mesh operators can explore different implementation choices and choose what works best for them.
This new model reduces complexity for mesh admins by allowing them to focus on mesh configurations without operating the control plane themselves. Mesh admins can continue to configure mesh-wide settings and Istio resources without any access to external control plane clusters. Mesh users can continue to interact with the service mesh without any changes.



Introducing the new Istio steering committee
Mon, 24 Aug 2020 00:00:00 +0000
Today, the Istio project is pleased to announce a new revision to its steering charter, which opens up governance roles to more contributors and community members.  This revision solidifies our commitment to open governance, ensuring that the community around the project will always be able to steer its direction, and that no one company has majority voting control over the project.
The Istio Steering Committee oversees the administrative aspects of the project and sets the marketing direction. From the earliest days of the project, it was bootstrapped with members from Google and IBM, the two founders and largest contributors, with the explicit intention that other seats would be added. We are very happy to deliver on that promise today, with a new charter designed to reward contribution and community.
The new Steering Committee consists of 13 seats: 9 proportionally allocated Contribution Seats, and 4 elected Community Seats.
Contribution Seats
The direction of a project is set by the people who contribute to it. We’ve designed our committee to reflect that, with 9 seats to be attributed in proportion to contributions made to Istio in the previous 12 months. In Kubernetes, the mantra was “chop wood, carry water,” and we similarly want to reward companies who are fueling the growth of the project with contributions.
This year, we’ve chosen to use merged pull requests as our proxy for proportional contribution. We know that no measure of contribution is perfect, and as such we will explicitly reconsider the formula every year. (Other measures we considered, including commits, comments, and actions, gave the same results for this period.)
In order to ensure corporate diversity, there will always be a minimum of three companies represented in Contribution Seats.
Community Seats
There are many wonderful contributors to the Istio community, including developers, SREs and mesh admins, working for companies large and small. We wanted to ensure that their voices were included, both in terms of representation and selection.
We have added 4 seats for representatives from 4 different organizations, who are not represented in the Contribution Seat allocation. These seats will be voted on by the Istio community in an annual election.
Any project member can stand for election; all Istio members who have been active in the last 12 months are eligible to vote.
Corporate diversification is the goal
Our goal is that the governance of Istio reflects the diverse set of contributors. Both Google and IBM/Red Hat will have fewer seats than previously, and the new model is designed to ensure representation from at least 7 different organizations.
We also want to make it clear that no single vendor, no matter how large their contribution, has majority voting control over the Istio project. We’ve implemented a cap on the number of seats a company can hold, such that they can neither unanimously win a vote, or veto a decision of the rest of the committee.
The 2020 committee and election
According to our seat allocation process, this year Google will be allocated 5 seats and IBM/Red Hat will be allocated 3. As the third largest contributor to Istio in the last 12 months, we are pleased to announce that Salesforce has earned a Contribution Seat.
The first election for Community Seats begins today.  Members have two weeks to nominate themselves, and voting will run from 14 to 27 September. You can learn all about the election in the istio/community repository on GitHub.  We’re also hosting a special community meeting this Thursday at 10:00 Pacific to discuss the changes and the election process. We’d love to see you there!



Using MOSN with Istio: an alternative data plane
Tue, 28 Jul 2020 00:00:00 +0000
MOSN (Modular Open Smart Network) is a network proxy server written in Go. It was built at Ant Group as a sidecar/API Gateway/cloud-native Ingress/Layer 4 or Layer 7 load balancer etc. Over time, we’ve added extra features, like a multi-protocol framework, multi-process plug-in mechanism, a DSL, and support for the xDS APIs. Supporting xDS means we are now able to use MOSN as the network proxy for Istio. This configuration is not supported by the Istio project; for help, please see Learn More below.
Background
In the service mesh world, using Istio as the control plane has become the mainstream. Because Istio was built on Envoy, it uses Envoy’s data plane APIs (collectively known as the xDS APIs). These APIs have been standardized separately from Envoy, and so by implementing them in MOSN, we are able to drop in MOSN as a replacement for Envoy. Istio’s integration of third-party data planes can be implemented in three steps, as follows.

Implement xDS protocols to fulfill the capabilities for data plane related services.
Build proxyv2 images using Istio’s script and set the relevant SIDECAR and other parameters.
Specify a specific data plane via the istioctl tool and set the proxy-related configuration.

Architecture
MOSN has a layered architecture with four layers, NET/IO, Protocol, Stream, and Proxy, as shown in the following figure.

    
        
            
        
    
    The architecture of MOSN


NET/IO acts as the network layer, monitoring connections and incoming packets, and as a mount point for the listener filter and network filter.
Protocol is the multi-protocol engine layer that examines packets and uses the corresponding protocol for decode/encode processing.
Stream does a secondary encapsulation of the decode packet into stream, which acts as a mount for the stream filter.
Proxy acts as a forwarding framework for MOSN, and does proxy processing on the encapsulated streams.

Why use MOSN?
Before the service mesh transformation, we have expected that as the next generation of Ant Group’s infrastructure, service mesh will inevitably bring revolutionary changes and evolution costs. We have a very ambitious blueprint: ready to integrate the original network and middleware various capabilities have been re-precipitated and polished to create a low-level platform for the next-generation architecture of the future, which will carry the responsibility of various service communications.
This is a long-term planning project that takes many years to build and meets the needs of the next five or even ten years, and cooperates to build a team that spans business, SRE, middleware, and infrastructure departments. We must have a network proxy forwarding plane with flexible expansion, high performance, and long-term evolution. Nginx and Envoy have a very long-term capacity accumulation and active community in the field of network agents. We have also borrowed from other excellent open source network agents such as Nginx and Envoy. At the same time, we have enhanced research and development efficiency and flexible expansion. Mesh transformation involves a large number of departments and R & D personnel. We must consider the landing cost of cross-team cooperation. Therefore, we have developed a new network proxy MOSN based on Go in the cloud-native scenario. For Go’s performance, we also did a full investigation and test in the early stage to meet the performance requirements of Ant Group’s services.
At the same time, we received a lot of feedback and needs from the end user community. Everyone has the same needs and thoughts. So we combined the actual situation of the community and ourselves to conduct the research and development of MOSN from the perspective of satisfying the community and users. We believe that the open source competition is mainly competition between standards and specifications. We need to make the most suitable implementation choice based on open source standards.
What is the difference between MOSN and Istio’s default proxy?
Differences in language stacks
MOSN is written in Go. Go has strong guarantees in terms of production efficiency and memory security. At the same time, Go has an extensive library ecosystem in the cloud-native era. The performance is acceptable and usable in the service mesh scenario. Therefore, MOSN has a lower intellectual cost for companies and individuals using languages such as Go and Java.
Differentiation of core competence

MOSN supports a multi-protocol framework, and users can easily access private protocols with a unified routing framework.
Multi-process plug-in mechanism, which can easily extend the plug-ins of independent MOSN processes through the plug-in framework, and do some other management, bypass and other functional module extensions.
Transport layer national secret algorithm support with Chinese encryption compliance, etc.

What are the drawbacks of MOSN

Because MOSN is written in Go, it doesn’t have as good performance as Istio default proxy, but the performance is acceptable and usable in the service mesh scenario.
Compared with Istio default proxy, some features are not fully supported, such as WASM, HTTP3, Lua, etc.  However, these are all in the roadmap of MOSN, and the goal is to be fully compatible with Istio.

MOSN with Istio
The following describes how to set up MOSN as the data plane for Istio.
Setup Istio
You can download a zip file for your operating system from the Istio release page. This file contains: the installation file, examples and the istioctl command line tool.
To download Istio (this example uses Istio 1.5.2) uses the following command.
$ export ISTIO_VERSION=1.5.2
$ curl -L https://istio.io/downloadIstio | sh -
The downloaded Istio package is named istio-1.5.2 and contains:

install/kubernetes: Contains YAML installation files related to Kubernetes.
examples/: Contains example applications.
bin/: Contains the istioctl client files.

Switch to the folder where Istio is located.
$ cd istio-$ISTIO_VERSION/
Add the istioctl client path to $PATH with the following command.
$ export PATH=$PATH:$(pwd)/bin
Setting MOSN as the Data Plane
It is possible to flexibly customize the Istio control plane and data plane configuration parameters using the istioctl command line tool. MOSN can be specified as the data plane for Istio using the following command.
$ istioctl manifest apply  --set .values.global.proxy.image="mosnio/proxyv2:1.5.2-mosn"  --set meshConfig.defaultConfig.binaryPath="/usr/local/bin/mosn"
Check that Istio-related pods and services are deployed successfully.
$ kubectl get svc -n istio-system
If the service STATUS is Running, then Istio has been successfully installed using MOSN and you can now deploy the Bookinfo sample.
Bookinfo Examples
You can run the Bookinfo sample by following the MOSN with Istio tutorial where you can find instructions for using MOSN and Istio. You can install MOSN and get to the same point you would have using the default Istio instructions with Envoy.
Moving forward
Next, MOSN will not only be compatible with the features of the latest version of Istio, but also evolve in the following aspects.

As a microservices runtime, MOSN oriented programming makes services lighter, smaller and faster.
Programmable, support WASM.
More scenario support, Cache Mesh/Message Mesh/Block-chain Mesh etc.

MOSN is an open source project that anyone in the community can use, improve, and enjoy. We’d love you to join us! Here are a few ways to find out what’s happening and get involved.
Learn More

MOSN website
MOSN community
MOSN tutorials




Open and neutral: transferring our trademarks to the Open Usage Commons
Wed, 08 Jul 2020 00:00:00 +0000
Since day one, the Istio project has believed in the importance of being contributor-run, open, transparent and available to all. In that spirit, Google is pleased to announce that it will be transferring ownership of the project’s trademarks to the new Open Usage Commons.
Istio is an open source project, released under the Apache 2.0 license. That means people can copy, modify, distribute, make, use and sell the source code. The only freedom people don’t have under the Apache 2.0 license is to use the name Istio, or its logo, in a way that would confuse consumers.
As one of the founders of the project, Google is the current owner of the Istio trademark. While anyone who is using the software in accordance with the license can use the trademarks, the historic ownership has caused some confusion and uncertainty about who can use the name and how, and at times this confusion has been a barrier to community growth. So today, as part of Istio’s continued commitment to openness, Google is announcing that the Istio trademarks will be transferred to a new organization, the Open Usage Commons, to provide neutral, independent oversight of the marks.
A neutral home for Istio’s trademarks
The Open Usage Commons is a new organization that is focused solely on providing management and guidance of open source project trademarks in a way that is aligned with the Open Source Definition. For projects, particularly projects with robust ecosystems like Istio, ensuring that the trademark is available to anyone who is using the software in accordance with the license is important. The trademark allows maintainers to grow a community and use the name to do so. It also lets ecosystem partners create services on top of the project, and it enables developers to create tooling and integrations that reference the project. Maintainers, ecosystem partners, and developers alike must feel confident in their investments in Istio - for the long term. Google thinks having the Istio trademarks in the Open Usage Commons is the right way to give that clarity and provide that confidence.
The Open Usage Commons will work with the Istio Steering Committee to generate trademark usage guidelines. There will be no immediate changes to the Istio usage guidelines, and if you are currently using the Istio marks in a way that follows the existing brand guide, you can continue to do so.
You can learn more about open source project IP and the Open Usage Commons at openusage.org.
A continued commitment to open
The Open Usage Commons is focused on project trademarks; it does not address other facets of an open project, like rules around who gets decision-making votes. Similar to many projects in their early days, Istio’s committees started as small groups that stemmed from the founding companies. But Istio has grown and matured (last year Istio was #4 on GitHub’s list of fastest growing open source projects!), and it is time for the next evolution of Istio’s governance.
Recently, we were proud to appoint Neeraj Poddar, Co-founder & Chief Architect of Aspen Mesh, to the Technical Oversight Committee — the group responsible for all technical decision-making in the project. Neeraj is a long-time contributor to the project and served as a Working Group lead. The TOC is now made up of 7 members from 4 different companies - Tetrate, IBM, Google & now Aspen Mesh.
Our community is currently discussing how the Steering Committee, which oversees marketing and community activities, should be governed, to reflect the expanding community and ecosystem. If you have ideas for this new governance, visit the pull request on GitHub where an active discussion is taking place.
In the last 12 months, Istio has had commits from more than 100 organizations and currently has 70 maintainers from 14 different companies. This trend is the kind of contributor diversity the project’s founders intended, and nurturing it remains a priority. Google is excited about what the future holds for Istio, and hopes you’ll be a part of it.



Reworking our Addon Integrations
Thu, 04 Jun 2020 00:00:00 +0000
Starting with Istio 1.6, we are introducing a new method for integration with telemetry addons, such as Grafana, Prometheus, Zipkin, Jaeger, and Kiali.
In previous releases, these addons were bundled as part of the Istio installation. This allowed users to quickly get started with Istio without any complicated configurations to install and integrate these addons. However, it came with some issues:

The Istio addon installations were not as up to date or feature rich as upstream installation methods. Users were left missing out on some of the great features provided by these applications, such as:

Persistent storage
Features like Alertmanager for Prometheus
Advanced security settings


Integration with existing deployments that were using these features was more challenging than it should be.

Changes
In order to address these gaps, we have made a number of changes:


Added a new Integrations documentation section to explain which applications Istio can integrate with, how to use them, and best practices.


Reduced the amount of configuration required to set up telemetry addons


Grafana dashboards are now published to grafana.com.


Prometheus can now scrape all Istio pods using standard prometheus.io annotations. This allows most Prometheus deployments to work with Istio without any special configuration.




Removed the bundled addon installations from istioctl and the operator. Istio does not install components that are not delivered by the Istio project. As a result, Istio will stop shipping installation artifacts related to addons. However, Istio will guarantee version compatibility where necessary. It is the user’s responsibility to install these components by using the official Integrations documentation and artifacts provided by the respective projects. For demos, users can deploy simple YAML files from the samples/addons/ directory.


We hope these changes allow users to make the most of these addons so as to fully experience what Istio can offer.
Timeline

Istio 1.6: The new demo deployments for telemetry addons are available under samples/addons/ directory.
Istio 1.7: Upstream installation methods or the new samples deployment are the recommended installation methods. Installation by istioctl is deprecated.
Istio 1.8: Installation of addons by istioctl is removed.




Introducing Workload Entries
Thu, 21 May 2020 00:00:00 +0000
Introducing Workload Entries: Bridging Kubernetes and VMs
Historically, Istio has provided great experience to workloads that run on Kubernetes, but it has been less smooth for other types of workloads, such as Virtual Machines (VMs) and bare metal. The gaps included the inability to declaratively specify the properties of a sidecar on a VM, inability to properly respond to the lifecycle changes of the workload (e.g., booting to not ready to ready, or health checks), and cumbersome DNS workarounds as the workloads are migrated into Kubernetes to name a few.
Istio 1.6 has introduced a few changes in how you manage non-Kubernetes workloads, driven by a desire to make it easier to gain Istio’s benefits for use cases beyond containers, such as running traditional databases on a platform outside of Kubernetes, or adopting Istio’s features for existing applications without rewriting them.
Background
Prior to Istio 1.6, non-containerized workloads were configurable simply as an IP address in a ServiceEntry, which meant that they only existed as part of a service. Istio lacked a first-class abstraction for these non-containerized workloads, something similar to how Kubernetes treats Pods as the fundamental unit of compute - a named object that serves as the collection point for all things related to a workload - name, labels, security properties, lifecycle status events, etc. Enter WorkloadEntry.
Consider the following ServiceEntry describing a service implemented by a few tens of VMs with IP addresses:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: svc1
spec:
  hosts:
  - svc1.internal.com
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: STATIC
  endpoints:
  - address: 1.1.1.1
  - address: 2.2.2.2
  ....
If you wanted to migrate this service into Kubernetes in an active-active manner - i.e. launch a bunch of Pods, send a portion of the traffic to the Pods over Istio mutual TLS (mTLS) and send the rest to the VMs without sidecars - how would you do it? You would have needed to use a combination of a Kubernetes service, a virtual service, and a destination rule to achieve the behavior. Now, let’s say you decided to add sidecars to these VMs, one by one, such that you want only the traffic to the VMs with sidecars to use Istio mTLS. If any other Service Entry happens to include the same VM in its addresses, things start to get very complicated and error prone.
The primary source of these complications is that Istio lacked a first-class definition of a non-containerized workload, whose properties can be described independently of the service(s) it is part of.

    
        
            
        
    
    The Internal of Service Entries Pointing to Workload Entries

Workload Entry: A Non-Kubernetes Endpoint
WorkloadEntry was created specifically to solve this problem. WorkloadEntry allows you to describe non-Pod endpoints that should still be part of the mesh, and treat them the same as a Pod. From here everything becomes easier, like enabling MUTUAL_TLS between workloads, whether they are containerized or not.
To create a WorkloadEntry and attach it to a ServiceEntry you can do something like this:
---
apiVersion: networking.istio.io/v1alpha3
kind: WorkloadEntry
metadata:
  name: vm1
  namespace: ns1
spec:
  address: 1.1.1.1
  labels:
    app: foo
    instance-id: vm-78ad2
    class: vm
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: svc1
  namespace: ns1
spec:
  hosts:
  - svc1.internal.com
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: STATIC
  workloadSelector:
    labels:
      app: foo
This creates a new WorkloadEntry with a set of labels and an address, and a ServiceEntry that uses a WorkloadSelector to select all endpoints with the desired labels, in this case including the WorkloadEntry that are created for the VM.

    
        
            
        
    
    The Internal of Service Entries Pointing to Workload Entries

Notice that the ServiceEntry can reference both Pods and WorkloadEntries, using the same selector. VMs and Pods can now be treated identically by Istio, rather than being kept separate.
If you were to migrate some of your workloads to Kubernetes, and you choose to keep a substantial number of your VMs, the WorkloadSelector can select both Pods and VMs, and Istio will automatically load balance between them. The 1.6 changes also mean that WorkloadSelector syncs configurations between the Pods and VMs and removes the manual requirement to target both infrastructures with duplicate policies like mTLS and authorization.
The Istio 1.6 release provides a great starting point for what will be possible for the future of Istio. The ability to describe what exists outside of the mesh the same way you do with a Pod leads to added benefits like improved bootstrapping experience. However, these benefits are merely side effects. The core benefit is you can now have VMs, and Pods co-exist without any configuration needed to bridge the two together.



Safely Upgrade Istio using a Canary Control Plane Deployment
Tue, 19 May 2020 00:00:00 +0000
Canary deployments are a core feature of Istio. Users rely on Istio’s traffic management features to safely control the rollout of new versions of their applications, while making use of Istio’s rich telemetry to compare the performance of canaries. However, when it came to upgrading Istio, there was not an easy way to canary the upgrade, and due to the in-place nature of the upgrade, issues or changes found affect the entire mesh at once.
Istio 1.6 will support a new upgrade model to safely canary-deploy new versions of Istio. In this new model, proxies will associate with a specific control plane that they use. This allows a new version to deploy to the cluster with less risk - no proxies connect to the new version until the user explicitly chooses to. This allows gradually migrating workloads to the new control plane, while monitoring changes using Istio telemetry to investigate any issues, just like using VirtualService for workloads. Each independent control plane is referred to as a “revision” and has an istio.io/rev label.
Understanding upgrades
Upgrading Istio is a complicated process. During the transition period between two versions, which might take a long time for large clusters, there are version differences between proxies and the control plane. In the old model the old and new control planes use the same Service, traffic is randomly distributed between the two, offering no control to the user. However, in the new model, there is not cross-version communication. Look at how the upgrade changes:

Configuring
Control plane selection is done based on the sidecar injection webhook. Each control plane is configured to select objects with a matching istio.io/rev label on the namespace. Then, the upgrade process configures the pods to connect to a control plane specific to that revision. Unlike in the current model, this means that a given proxy connects to the same revision during its lifetime. This avoids subtle issues that might arise when a proxy switches which control plane it is connected to.
The new istio.io/rev label will replace the istio-injection=enabled label when using revisions. For example, if we had a revision named canary, we would label our namespaces that we want to use this revision with istio.io/rev=canary. See the upgrade guide for more information.



Direct encrypted traffic from IBM Cloud Kubernetes Service Ingress to Istio Ingress Gateway
Fri, 15 May 2020 00:00:00 +0000
In this blog post I show how to configure the Ingress Application Load Balancer (ALB)
on IBM Cloud Kubernetes Service (IKS) to direct traffic to the Istio
ingress gateway, while securing the traffic between them using mutual TLS authentication.
When you use IKS without Istio, you may control your ingress traffic using the provided ALB. This ingress-traffic
routing is configured using a Kubernetes
Ingress resource with
ALB-specific annotations. IKS provides a
DNS domain name, a TLS certificate that matches the domain, and a private key for the certificate. IKS stores the
certificates and the private key in a Kubernetes secret.
When you start using Istio in your IKS cluster, the recommended method to send traffic to your Istio enabled workloads
is by using the Istio Ingress Gateway instead of using the
Kubernetes Ingress. One of the main reasons to use
the Istio ingress gateway is the fact the ALB provided by IKS will not be able to communicate directly with the services
inside the mesh when you enable STRICT mutual TLS. During your transition to having only Istio ingress gateway as your
main entry point, you can continue to use the traditional Ingress for non-Istio services while using the Istio ingress
gateway for services that are part of the mesh.
IKS provides a convenient way for clients to access Istio ingress gateway by letting you
register a new DNS subdomain for the
Istio gateway’s IP with an IKS command. The domain is in the following
format:
--0001..containers.appdomain.cloud, for example mycluster-a1b2cdef345678g9hi012j3kl4567890-0001.us-south.containers.appdomain.cloud. In the same way as for the ALB domain,
IKS provides a certificate and a private key, storing them in another Kubernetes secret.
This blog describes how you can chain together the IKS Ingress ALB and the Istio ingress gateway to send traffic to your
Istio enabled workloads while being able to continue using the ALB specific features and the ALB subdomain name. You
configure the IKS Ingress ALB to direct traffic to the services inside an Istio service mesh through the Istio ingress
gateway, while using mutual TLS authentication between the ALB and the gateway. For the mutual TLS authentication, you
will configure the ALB and the Istio ingress gateway to use the certificates and keys provided by IKS for the ALB and
NLB subdomains. Using certificates provided by IKS saves you the overhead of managing your own certificates for the
connection between the ALB and the Istio ingress gateway.
You will use the NLB subdomain certificate as the server certificate for the Istio ingress gateway as intended.
The NLB subdomain certificate represents the identity of the server that serves a particular NLB subdomain, in this
case, the ingress gateway.
You will use the ALB subdomain certificate as the client certificate in mutual TLS authentication between the ALB and
the Istio Ingress. When ALB acts as a server it presents the ALB certificate to the clients so the clients can
authenticate the ALB. When ALB acts as a client of the Istio ingress gateway, it presents the same certificate to the
Istio ingress gateway, so the Istio ingress gateway could authenticate the ALB.

    
        
            
        
        Note that the instructions in this blog post only configure the ALB and the Istio ingress gateway to encrypt the traffic
between them and to verify that they receive valid certificates issued by Let’s Encrypt. In
order to specify that only the ALB is allowed to talk to the Istio ingress gateway, an additional Istio security policy
must be defined. In order to verify that the ALB indeed talks to the Istio ingress gateway, additional configuration
must be added to the ALB. The additional configuration of the Istio ingress gateway and the ALB is out of scope for this
blog.
    


Traffic to the services without an Istio sidecar can continue to flow as before directly from the ALB.
The diagram below exemplifies the described setting. It shows two services in the cluster, service A and service B.
service A has an Istio sidecar injected and requires mutual TLS. service B has no Istio sidecar. service B can
be accessed by clients through the ALB, which directly communicates with service B. service A can be also
accessed by clients through the ALB, but in this case the traffic must pass through the Istio ingress gateway. Mutual
TLS authentication between the ALB and the gateway is based on the certificates provided by IKS.
The clients can also access the Istio ingress gateway directly. IKS registers different DNS domains for the ALB and for
the ingress gateway.

    
        
            
        
    
    A cluster with the ALB and the Istio ingress gateway

Initial setting


Create the httptools namespace and enable Istio sidecar injection:
$ kubectl create namespace httptools
$ kubectl label namespace httptools istio-injection=enabled
namespace/httptools created
namespace/httptools labeled


Deploy the httpbin sample to httptools:
Zip$ kubectl apply -f @samples/httpbin/httpbin.yaml@ -n httptools
service/httpbin created
deployment.apps/httpbin created


Create secrets for the ALB and the Istio ingress gateway
IKS generates a TLS certificate and a private key and stores them as a secret in the default namespace when you register
a DNS domain for an external IP by using the ibmcloud ks nlb-dns-create command. IKS stores the ALB’s
certificate and private key also as a secret in the default namespace. You need these credentials to establish the
identities that the ALB and the Istio ingress gateway will present during the mutual TLS authentication between
them. You will configure the ALB and the Istio ingress gateway to exchange these certificates, to trust the certificates
of one another, and to use their private keys to encrypt and sign the traffic.


Store the name of your cluster in the CLUSTER_NAME environment variable:
$ export CLUSTER_NAME=


Store the domain name of your ALB in the ALB_INGRESS_DOMAIN environment variable:
$ ibmcloud ks cluster get --cluster $CLUSTER_NAME | grep Ingress
Ingress Subdomain:              
Ingress Secret:                 
$ export ALB_INGRESS_DOMAIN=
$ export ALB_SECRET=


Store the external IP of your istio-ingressgateway service in an environment variable.
$ export INGRESS_GATEWAY_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo INGRESS_GATEWAY_IP = $INGRESS_GATEWAY_IP


Create a DNS domain and certificates for the IP of the Istio Ingress Gateway service:
$ ibmcloud ks nlb-dns create classic --cluster $CLUSTER_NAME --ip $INGRESS_GATEWAY_IP --secret-namespace istio-system
Host name subdomain is created as 


Store the domain name from the previous command in an environment variable:
$ export INGRESS_GATEWAY_DOMAIN=


List the registered domain names:
$ ibmcloud ks nlb-dnss --cluster $CLUSTER_NAME
Retrieving host names, certificates, IPs, and health check monitors for network load balancer (NLB) pods in cluster ...
OK
Hostname                          IP(s)                       Health Monitor   SSL Cert Status   SSL Cert Secret Name                          Secret Namespace
      None             created                      istio-system
...
Wait until the status of the certificate (the fourth field) of the new domain name becomes enabled (initially it is pending).


Store the name of the secret of the new domain name:
$ export INGRESS_GATEWAY_SECRET=


Extract the certificate and the key from the secret provided for the ALB:
$ mkdir alb_certs
$ kubectl get secret $ALB_SECRET --namespace=default -o yaml | grep 'tls.key:' | cut -f2 -d: | base64 --decode > alb_certs/client.key
$ kubectl get secret $ALB_SECRET --namespace=default -o yaml | grep 'tls.crt:' | cut -f2 -d: | base64 --decode > alb_certs/client.crt
$ ls -al alb_certs
-rw-r--r--   1 user  staff  3738 Sep 11 07:57 client.crt
-rw-r--r--   1 user  staff  1675 Sep 11 07:57 client.key


Download the issuer certificate of the Let’s Encrypt certificate, which is the
issuer of the certificates provided by IKS. You specify this certificate as the certificate of a certificate
authority to trust, for both the ALB and the Istio ingress gateway.
$ curl https://letsencrypt.org/certs/trustid-x3-root.pem --output trusted.crt


Create a Kubernetes secret to be used by the ALB to establish mutual TLS connection.

    
        
            
        
        The certificates provided by IKS expire every 90 days and are automatically renewed by
IKS 37 days before they expire.
You will have to recreate the secrets by rerunning the instructions of this section every time the secrets provided
by IKS are updated. You may want to use scripts or operators to automate this and keep the
secrets in sync.
    


$ kubectl create secret generic alb-certs -n istio-system --from-file=trusted.crt --from-file=alb_certs/client.crt --from-file=alb_certs/client.key
secret "alb-certs" created


For mutual TLS, a separate Secret named -cacert with a cacert key is needed for the ingress gateway.
$ kubectl create -n istio-system secret generic $INGRESS_GATEWAY_SECRET-cacert --from-file=ca.crt=trusted.crt
secret/cluster_name-hash-XXXX-cacert created


Configure a mutual TLS ingress gateway
In this section you configure the Istio ingress gateway to perform mutual TLS between external clients and the gateway.
You use the certificates and the keys provided to you for the ingress gateway and the ALB.


Define a Gateway to allow access on port 443 only, with mutual TLS:
$ kubectl apply -n httptools -f - <



Configure routes for traffic entering via the Gateway:
$ kubectl apply -n httptools -f - <



Send a request to httpbin by curl, passing as parameters the client certificate
(the --cert option) and the private key (the --key option):
$ curl https://$INGRESS_GATEWAY_DOMAIN/status/418 --cert alb_certs/client.crt  --key alb_certs/client.key

-=[ teapot ]=-

   _...._
 .'  _ _ `.
| ."` ^ `". _,
\_;`"---"`|//
  |       ;/
  \_     _/
    `"""`


Remove the directories with the ALB and ingress gateway certificates and keys.
$ rm -r alb_certs trusted.crt


Configure the ALB
You need to configure your Ingress resource to direct traffic to the Istio ingress gateway while using the certificate
stored in the alb-certs secret. Normally, the ALB decrypts HTTPS requests before forwarding traffic to your apps.
You can configure the ALB to re-encrypt the traffic before it is forwarded to the Istio ingress gateway by using the
ssl-services annotation on the Ingress resource. This annotation also allows you to specify the certificate stored in
the alb-certs secret, required for mutual TLS.


Configure the Ingress resource for the ALB. You must create the Ingress resource in the istio-system namespace
in order to forward the traffic to the Istio ingress gateway.
$ kubectl apply -f - <



Test the ALB ingress:
$ curl https://httpbin.$ALB_INGRESS_DOMAIN/status/418

-=[ teapot ]=-

   _...._
 .'  _ _ `.
| ."` ^ `". _,
\_;`"---"`|//
  |       ;/
  \_     _/
    `"""`


Congratulations! You configured the IKS Ingress ALB to send encrypted traffic to the Istio ingress gateway. You
allocated a host name and certificate for your Istio ingress gateway and used that certificate as the server certificate
for Istio ingress gateway. As the client certificate of the ALB you used the certificate provided by IKS for the ALB.
Once you had the certificates deployed as Kubernetes secrets, you directed the ingress traffic from the ALB to the Istio
ingress gateway for some specific paths and used the certificates for mutual TLS authentication between the ALB and the
Istio ingress gateway.
Cleanup


Delete the Gateway configuration, the VirtualService, and the secrets:
$ kubectl delete ingress alb-ingress -n istio-system
$ kubectl delete virtualservice default-ingress -n httptools
$ kubectl delete gateway default-ingress-gateway -n httptools
$ kubectl delete secrets alb-certs -n istio-system
$ rm -rf alb_certs trusted.crt
$ unset CLUSTER_NAME ALB_INGRESS_DOMAIN ALB_SECRET INGRESS_GATEWAY_DOMAIN INGRESS_GATEWAY_SECRET


Shutdown the httpbin service:
Zip$ kubectl delete -f @samples/httpbin/httpbin.yaml@ -n httptools


Delete the httptools namespace:
$ kubectl delete namespace httptools





Provision a certificate and key for an application without sidecars
Wed, 25 Mar 2020 00:00:00 +0000

    
        
            
        
        The following information describes an experimental feature, which is intended
for evaluation purposes only.
    


Istio sidecars obtain their certificates using
the secret discovery service.
A service in the service mesh may not need (or want) an Envoy sidecar
to handle its traffic. In this case, the service will need
to obtain a certificate itself if it wants to connect to other TLS or mutual TLS secured services.
For a service with no need of a sidecar to manage its traffic, a sidecar can nevertheless still be
deployed only to provision the private key and certificates through
the CSR flow from the CA and then share the certificate with the service
through a mounted file in tmpfs.
We have used Prometheus as our example application for provisioning
a certificate using this mechanism.
In the example application (i.e., Prometheus), a sidecar is added to the
Prometheus deployment by setting the flag .Values.prometheus.provisionPrometheusCert
to true (this flag is set to true by default in an Istio installation).
This deployed sidecar will then request and share a
certificate with Prometheus.
The key and certificate provisioned for the example application
are mounted in the directory /etc/istio-certs/.
We can list the key and certificate provisioned for the application by
running the following command:
$ kubectl exec -it `kubectl get pod -l app=prometheus -n istio-system -o jsonpath='{.items[0].metadata.name}'` -c prometheus -n istio-system -- ls -la /etc/istio-certs/
The output from the above command should include non-empty key and certificate files, similar to the following:
-rwxr-xr-x    1 root     root          2209 Feb 25 13:06 cert-chain.pem
-rwxr-xr-x    1 root     root          1679 Feb 25 13:06 key.pem
-rwxr-xr-x    1 root     root          1054 Feb 25 13:06 root-cert.pem
If you want to use this mechanism to provision a certificate
for your own application, take a look at our
Prometheus example application and simply follow the same pattern.



Extended and Improved WebAssemblyHub to Bring the Power of WebAssembly to Envoy and Istio
Wed, 25 Mar 2020 00:00:00 +0000
Originally posted on the Solo.io blog
As organizations adopt Envoy-based infrastructure like Istio to help solve challenges with microservices communication, they inevitably find themselves needing to customize some part of that infrastructure to fit within their organization’s constraints. WebAssembly (Wasm) has emerged as a safe, secure, and dynamic environment for platform extension.
In the recent announcement of Istio 1.5, the Istio project lays the foundation for bringing WebAssembly to the popular Envoy proxy. Solo.io is collaborating with Google and the Istio community to simplify the overall experience of creating, sharing, and deploying WebAssembly extensions to Envoy and Istio. It wasn’t that long ago that Google and others laid the foundation for containers, and Docker built a great user experience to make it consumable. Similarly, this effort makes Wasm consumable by building the best user experience for WebAssembly on Istio.
Back in December 2019, Solo.io began an effort to provide a great developer experience for WebAssembly with the announcement of WebAssembly Hub. The WebAssembly Hub allows developers to very quickly spin up a new WebAssembly project in C++ (we’re expanding this language choice, see below), build it using Bazel in Docker, and push it to an OCI-compliant registry. From there, operators had to  pull the module, and configure Envoy proxies themselves to load it from disk. Beta support in Gloo, an API Gateway built on Envoy allows you to declaratively and dynamically load the module, the Solo.io team wanted to bring the same effortless and secure experience to other Envoy-based frameworks as well - like Istio.
There has been a lot of interest in the innovation in this area, and the Solo.io team has been working hard to further the capabilities of WebAssembly Hub and the workflows it supports. In conjunction with Istio 1.5, Solo.io is thrilled to announce new enhancements to WebAssembly Hub that evolve the viability of WebAssembly with Envoy for production, improve the developer experience, and streamline using Wasm with Envoy in Istio.
Evolving toward production
The Envoy community is working hard to bring Wasm support into the upstream project (right now it lives on a working development fork), with Istio declaring Wasm support an Alpha feature. In Gloo 1.0, we also announced early, non-production support for Wasm. What is Gloo? Gloo is a modern API Gateway and Ingress Controller (built on Envoy Proxy) that supports routing and securing incoming traffic to legacy monoliths, microservices / Kubernetes and serverless functions. Dev and ops teams are able to shape and control traffic patterns from external end users/clients to backend application services. Gloo is a Kubernetes and Istio native ingress gateway.
Although it’s still maturing in each individual project, there are things that we, as a community, can do to improve the foundation for production support.
The first area is standardizing what a WebAssembly extension for Envoy looks like. Solo.io, Google, and the Istio community have defined an open specification for bundling and distributing WebAssembly modules as OCI images. This specification provides a powerful model for distributing any type of Wasm module including Envoy extensions.
This is open to the community - Join in the effort
The next area is improving the experience of deploying Wasm extensions into an Envoy-based framework running in production. In the Kubernetes ecosystem, it is considered a best practice in production to use declarative CRD-based configuration to manage cluster configuration. The new WebAssembly Hub Operator adds a single, declarative CRD which automatically deploys and configures Wasm filters to Envoy proxies running inside of a Kubernetes cluster. This operator enables GitOps workflows and cluster automation to manage Wasm filters without human intervention or imperative workflows. We will provide more information about the Operator in an upcoming blog post.
Lastly, the interactions between developers of Wasm extensions and the teams that deploy them need some kind of role-based access, organization management, and facilities to share, discover, and consume these extensions. The WebAssembly Hub adds team management features like permissions, organizations, user management, sharing, and more.
Improving the developer experience
As developers want to target more languages and runtimes, the experience must be kept as simple and as productive as possible. Multi-language support and runtime ABI (Application Binary Interface) targets should be handled automatically in tooling.
One of the benefits of Wasm is the ability to write modules in many languages. The collaboration between Solo.io and Google provides out-of-the-box support for Envoy filters written in C++, Rust, and AssemblyScript. We will continue to add support for more languages.
Wasm extensions use the Application Binary Interface (ABI) within the Envoy proxy to which they are deployed. The WebAssembly Hub provides strong ABI versioning guarantees between Envoy, Istio, and Gloo to prevent unpredictable behavior and bugs. All you have to worry about is writing your extension code.
Lastly, like Docker, the WebAssembly Hub stores and distributes Wasm extensions as OCI images. This makes pushing, pulling, and running extensions as easy as Docker containers. Wasm extension images are versioned and cryptographically secure, making it safe to run extensions locally the same way you would in production. This allows you to build and push as well as trust the source when they pull down and deploy images.
WebAssembly Hub with Istio
The WebAssembly Hub now fully automates the process of deploying Wasm extensions to Istio, (as well as other Envoy-based frameworks like Gloo API Gateway) installed in Kubernetes. With this deployment feature, the WebAssembly Hub relieves the operator or end user from having to manually configure the Envoy proxy in their Istio service mesh to use their WebAssembly modules.
Take a look at the following video to see just how easy it is to get started with WebAssembly and Istio:

Part 1
Part 2

Get Started
We hope that the WebAssembly Hub will become a meeting place for the community to share, discover, and distribute Wasm extensions. By providing a great user experience, we hope to make developing, installing, and running Wasm easier and more rewarding. Join us at the WebAssembly Hub, share your extensions and ideas, and join an upcoming webinar.



Introducing istiod: simplifying the control plane
Thu, 19 Mar 2020 00:00:00 +0000
Microservices are a great pattern when they map services to disparate teams that deliver them, or when the value of independent rollout and the value of independent scale are greater than the cost of orchestration. We regularly talk to customers and teams running Istio in the real world, and they told us that none of these were the case for the Istio control plane. So, in Istio 1.5, we’ve changed how Istio is packaged, consolidating the control plane functionality into a single binary called istiod.
History of the Istio control plane
Istio implements a pattern that has been in use at both Google and IBM for many years, which later became known as “service mesh”. By pairing client and server processes with proxy servers, they act as an application-aware data plane that’s not simply moving packets around hosts, or pulses over wires.
This pattern helps the world come to terms with microservices: fine-grained, loosely-coupled services connected via lightweight protocols. The common cross-platform and cross-language standards like HTTP and gRPC that replace proprietary transports, and the widespread presence of the needed libraries, empower different teams to write different parts of an overall architecture in whatever language makes the most sense. Furthermore, each service can scale independently as needed. A desire to implement security, observability and traffic control for such a network powers Istio’s popularity.
Istio’s control plane is, itself, a modern, cloud-native application. Thus, it was built from the start as a set of microservices. Individual Istio components like service discovery (Pilot), configuration (Galley), certificate generation (Citadel) and extensibility (Mixer) were all written and deployed as separate microservices.  The need for these components to communicate securely and be observable, provided opportunities for Istio to eat its own dogfood (or “drink its own champagne”, to use a more French version of the metaphor!).
The cost of complexity
Good teams look back upon their choices and, with the benefit of hindsight, revisit them. Generally, when a team adopts microservices and their inherent complexity, they look for improvements in other areas to justify the tradeoffs. Let’s look at the Istio control plane through that lens.


Microservices empower you to write in different languages. The data plane (the Envoy proxy) is written in C++, and this boundary benefits from a clean separation in terms of the xDS APIs. However, all of the Istio control plane components are written in Go. We were able to choose the appropriate language for the appropriate job: highly performant C++ for the proxy, but accessible and speedy-development for everything else.


Microservices empower you to allow different teams to manage services individually.. In the vast majority of Istio installations, all the components are installed and operated by a single team or individual. The componentization done within Istio is aligned along the boundaries of the development teams who build it.  This would make sense if the Istio components were delivered as a managed service by the people who wrote them, but this is not the case! Making life simpler for the development teams had an outsized impact of the usability for the orders-of-magnitude more users.


Microservices empower you to decouple versions, and release different components at different times. All the components of the control plane have always been released at the same version, at the same time.  We have never tested or supported running different versions of (for example) Citadel and Pilot.


Microservices empower you to scale components independently. In Istio 1.5, control plane costs are dominated by a single feature: serving the Envoy xDS APIs that program the data plane. Every other feature has a marginal cost, which means there is very little value to having those features in separately-scalable microservices.


Microservices empower you to maintain security boundaries. Another good reason to separate an application into different microservices is if they have different security roles. Multiple Istio microservices like the sidecar injector, the Envoy bootstrap, Citadel, and Pilot hold nearly equivalent permissions to change the proxy configuration. Therefore, exploiting any of these services would cause near equivalent damage. When you deploy Istio, all the components are installed by default into the same Kubernetes namespace, offering limited security isolation.


The benefit of consolidation: introducing istiod
Having established that many of the common benefits of microservices didn’t apply to the Istio control plane, we decided to unify them into a single binary: istiod (the ’d’ is for daemon).
Let’s look at the benefits of the new packaging:


Installation becomes easier. Fewer Kubernetes deployments and associated configurations are required, so the set of configuration options and flags for Istio is reduced significantly. In the simplest case, you can start the Istio control plane, with all features enabled, by starting a single Pod.


Configuration becomes easier. Many of the configuration options that Istio has today are ways to orchestrate the control plane components, and so are no longer needed. You also no longer need to change cluster-wide PodSecurityPolicy to deploy Istio.


Using VMs becomes easier. To add a workload to a mesh, you now just need to install one agent and the generated certificates. That agent connects back to only a single service.


Maintenance becomes easier. Installing, upgrading, and removing Istio no longer require a complicated dance of version dependencies and startup orders. For example: To upgrade, you only need to start a new istiod version alongside your existing control plane, canary it, and then move all traffic over to it.


Scalability becomes easier. There is now only one component to scale.


Debugging becomes easier. Fewer components means less cross-component environmental debugging.


Startup time goes down. Components no longer need to wait for each other to start in a defined order.


Resource usage goes down and responsiveness goes up. Communication between components becomes guaranteed, and not subject to gRPC size limits. Caches can be shared safely, which decreases the resource footprint as a result.


istiod unifies functionality that Pilot, Galley, Citadel and the sidecar injector previously performed, into a single binary.
A separate component, the istio-agent, helps each sidecar connect to the mesh by securely passing configuration and secrets to the Envoy proxies. While the agent, strictly speaking, is still part of the control plane, it runs on a per-pod basis. We’ve further simplified by rolling per-node functionality that used to run as a DaemonSet, into that per-pod agent.
Extra for experts
There will still be some cases where you might want to run Istio components independently, or replace certain components.
Some users might want to use a Certificate Authority (CA) outside the mesh, and we have documentation on how to do that. If you do your certificate provisioning using a different tool, we can use that instead of the built-in CA.
Moving forward
At its heart, istiod is just a packaging and optimization change.  It’s built on the same code and API contracts as the separate components, and remains covered by our comprehensive test suite.  This gives us confidence in making it the default in Istio 1.5. The service is now called istiod - you’ll see an istio-pilot for existing proxies as the upgrade process completes.
While the move to istiod may seem like a big change, and is a huge improvement for the people who administer and maintain the mesh, it won’t make the day-to-day life of using Istio any different. istiod is not changing any of the APIs used to configure your mesh, so your existing processes will all stay the same.
Does this change imply that microservice are a mistake for all workloads and architectures? Of course not. They are a tool in a toolbelt, and they work best when they are reflected in your organizational reality. Instead, this change shows a willingness in the project to change based on user feedback, and a continued focus on simplification for all users. Microservices have to be right sized, and we believe we have found the right size for Istio.



Declarative WebAssembly deployment for Istio
Mon, 16 Mar 2020 00:00:00 +0000
As outlined in the Istio 2020 trade winds blog and more recently announced with Istio 1.5, WebAssembly (Wasm) is now an (alpha) option for extending the functionality of the Istio service proxy (Envoy proxy). With Wasm, users can build support for new protocols, custom metrics, loggers, and other filters. Working closely with Google, we in the community (Solo.io) have focused on the user experience of building, socializing, and deploying Wasm extensions to Istio. We’ve announced WebAssembly Hub and associated tooling to build a “docker-like” experience for working with Wasm.
Background
With the WebAssembly Hub tooling, we can use the wasme CLI to easily bootstrap a Wasm project for Envoy, push it to a repository, and then pull/deploy it to Istio. For example, to deploy a Wasm extension to Istio with wasme we can run the following:
$  wasme deploy istio webassemblyhub.io/ceposta/demo-add-header:v0.2 \
  --id=myfilter \
  --namespace=bookinfo \
  --config 'tomorrow'
This will add the demo-add-header extension to all workloads running in the bookinfo namespace. We can get more fine-grained control over which workloads get the extension by using the --labels parameter:
$  wasme deploy istio webassemblyhub.io/ceposta/demo-add-header:v0.2 \
  --id=myfilter  \
  --namespace=bookinfo  \
  --config 'tomorrow' \
  --labels app=details
This is a much easier experience than manually creating EnvoyFilter resources and trying to get the Wasm module to each of the pods that are part of the workload you’re trying to target. However, this is a very imperative approach to interacting with Istio. Just like users typically don’t use kubectl directly in production and prefer a declarative, resource-based workflow, we want the same for making customizations to our Istio proxies.
A declarative approach
The WebAssembly Hub tooling also includes an operator for deploying Wasm extensions to Istio workloads. The operator allows users to define their WebAssembly extensions using a declarative format and leave it to the operator to rectify the deployment. For example, we use a FilterDeployment resource to define what image and workloads need the extension:
apiVersion: wasme.io/v1
kind: FilterDeployment
metadata:
  name: bookinfo-custom-filter
  namespace: bookinfo
spec:
  deployment:
    istio:
      kind: Deployment
      labels:
        app: details
  filter:
    config: 'world'
    image: webassemblyhub.io/ceposta/demo-add-header:v0.2
We could then take this FilterDeployment document and version it with the rest of our Istio resources. You may be wondering why we need this Custom Resource to configure Istio’s service proxy to use a Wasm extension when Istio already has the EnvoyFilter resource.
Let’s take a look at exactly how all of this works under the covers.
How it works
Under the covers the operator is doing a few things that aid in deploying and configuring a Wasm extension into the Istio service proxy (Envoy Proxy).

Set up local cache of Wasm extensions
Pull desired Wasm extension into the local cache
Mount the wasm-cache into appropriate workloads
Configure Envoy with EnvoyFilter CRD to use the Wasm filter


    
        
            
        
    
    Understanding how wasme operator works

At the moment, the Wasm image needs to be published into a registry for the operator to correctly cache it. The cache pods run as DaemonSet on each node so that the cache can be mounted into the Envoy container. This is being improved, as it’s not the ideal mechanism. Ideally we wouldn’t have to deal with mounting anything and could stream the module to the proxy directly over HTTP, so stay tuned for updates (should land within next few days). The mount is established by using the sidecar.istio.io/userVolume and sidecar.istio.io/userVolumeMount annotations. See the docs on Istio Resource Annotations for more about how that works.
Once the Wasm module is cached correctly and mounted into the workload’s service proxy, the operator then configures the EnvoyFilter resources.
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: details-v1-myfilter
  namespace: bookinfo
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.http_connection_manager
            subFilter:
              name: envoy.router
    patch:
      operation: INSERT_BEFORE
      value:
        config:
          config:
            configuration: tomorrow
            name: myfilter
            rootId: add_header
            vmConfig:
              code:
                local:
                  filename: /var/local/lib/wasme-cache/44bf95b368e78fafb663020b43cf099b23fc6032814653f2f47e4d20643e7267
              runtime: envoy.wasm.runtime.v8
              vmId: myfilter
        name: envoy.filters.http.wasm
  workloadSelector:
    labels:
      app: details
      version: v1
You can see the EnvoyFilter resource configures the proxy to add the envoy.filter.http.wasm filter and load the Wasm module from the wasme-cache.
Once the Wasm extension is loaded into the Istio service proxy, it will extend the capabilities of the proxy with whatever custom code you introduced.
Next Steps
In this blog we explored options for installing Wasm extensions into Istio workloads. The easiest way to get started with WebAssembly on Istio is to use the wasme tool to bootstrap a new Wasm project with C++, AssemblyScript [or Rust coming really soon!]. For example, to set up a C++ Wasm module, you can run:
$ wasme init ./filter --language cpp --platform istio --platform-version 1.5.x
If we didn’t have the extra flags, wasme init would enter an interactive mode walking you through the correct values to choose.
Take a look at the WebAssembly Hub wasme tooling to get started with Wasm on Istio.
Learn more


Redefine Extensibility with WebAssembly on Envoy and Istio


WebAssembly SF talk (video): Extensions for network proxies, by John Plevyak


Solo blog


Proxy-Wasm ABI specification


Proxy-Wasm C++ SDK and
its developer documentation


Proxy-Wasm Rust SDK


Proxy-Wasm AssemblyScript SDK


Tutorials


Videos on the Solo.io YouTube Channel





Redefining extensibility in proxies - introducing WebAssembly to Envoy and Istio
Thu, 05 Mar 2020 00:00:00 +0000
Since adopting Envoy in 2016, the Istio project has always wanted to
provide a platform on top of which a rich set of extensions could be built, to meet the diverse
needs of our users. There are many reasons to add capability to the data plane of a service
mesh — to support newer protocols, integrate with proprietary security controls, or enhance
observability with custom metrics, to name a few.
Over the last year and a half our team here at Google has been working on adding dynamic
extensibility to the Envoy proxy using WebAssembly. We are delighted to
share that work with the world today, as well as
unveiling WebAssembly (Wasm) for Proxies (Proxy-Wasm): an ABI,
which we intend to standardize; SDKs; and its first major implementation, the new,
lower-latency Istio telemetry system.
We have also worked closely with the community to ensure that there is a great developer experience
for users to get started quickly. The Google team has been working closely with the team
at Solo.io who have built the WebAssembly Hub,
a service for building, sharing, discovering and deploying Wasm extensions.
With the WebAssembly Hub, Wasm extensions are as easy to manage, install and run as containers.
This work is being released today in Alpha and there is still lots
of work to be done, but we are excited to get this into the hands of developers
so they can start experimenting with the tremendous possibilities this opens up.
Background
The need for extensibility has been a founding tenet of both the Istio and Envoy projects,
but the two projects took different approaches. Istio project focused on enabling a generic
out-of-process extension model called Mixer
with a lightweight developer experience, while Envoy focused on in-proxy extensions.
Each approach has its share of pros and cons. The Istio model led to significant resource
inefficiencies that impacted tail latencies and resource utilization. This model was also
intrinsically limited - for example, it was never going to provide support for
implementing custom protocol handling.
The Envoy model imposed a monolithic build process, and required extensions to be written in C++,
limiting the developer ecosystem. Rolling out a new extension to the fleet required pushing new
binaries and rolling restarts, which can be difficult to coordinate, and risk downtime. This also
incentivized developers to upstream extensions into Envoy that were used by only a small
percentage of deployments, just to piggyback on its release mechanisms.
Over time some of the most performance-sensitive features of Istio have been upstreamed
into Envoy - policy checks on traffic, and
JWT authentication, for example.
Still, we have always wanted to converge on a single stack for extensibility that imposes fewer
tradeoffs: something that decouples Envoy releases from its extension ecosystem, enables
developers to work in their languages of choice, and enables Istio to reliably roll out new
capability without downtime risk. Enter WebAssembly.
What is WebAssembly?
WebAssembly (Wasm) is a portable bytecode format for executing code
written in multiple languages at
near-native speed. Its initial design goals align
well with the challenges outlined above, and it has sizable industry support behind it. Wasm
is the fourth standard language (following HTML, CSS and JavaScript) to run natively in all
the major browsers, having become a W3C Recommendation in
December 2019. That gives us confidence in making a strategic bet on it.
While WebAssembly started life as a client-side technology, there are a number of advantages
to using it on the server. The runtime is memory-safe and sandboxed for security. There is a
large tooling ecosystem for compiling and debugging Wasm in its textual or binary format.
The W3C and BytecodeAlliance have become
active hubs for other server-side efforts. For example, the Wasm community is standardizing
a “WebAssembly System Interface” (WASI)
at the W3C, with a sample implementation, which provides an OS-like abstraction to Wasm ‘programs’.
Bringing WebAssembly to Envoy
Over the past 18 months, we have been working
with the Envoy community to build Wasm extensibility into Envoy and contribute it upstream.
We’re pleased to announce it is available as Alpha in the Envoy build shipped
with Istio 1.5, with source in
the envoy-wasm development fork and work ongoing to
merge it into the main Envoy tree. The implementation uses the WebAssembly runtime built into
Google’s high performance V8 engine.
In addition to the underlying runtime, we have also built:


A generic Application Binary Interface (ABI) for embedding Wasm in proxies, which means compiled
extensions will work across different versions of Envoy - or even other proxies, should they choose
to implement the ABI


SDKs for easy extension development in C++,
Rust
and AssemblyScript, with more to follow


Comprehensive samples and instructions
on how to deploy in Istio and standalone Envoy


Abstractions to allow for other Wasm runtimes to be used, including a ’null’ runtime which
simply compiles the extension natively into Envoy — very useful for testing and debugging


Using Wasm for extending Envoy brings us several key benefits:


Agility: Extensions can be delivered and reloaded at runtime using the Istio control plane.
This enables a fast develop → test → release cycle for extensions without
requiring Envoy rollouts.


Stock releases: Once merging into the main tree is complete, Istio and others will be able to
use stock releases of Envoy, instead of custom builds. This will also free the Envoy community
to move some of the built-in extensions to this model, thereby reducing their
supported footprint.


Reliability and isolation: Extensions are deployed inside a sandbox with resource constraints,
which means they can now crash, or leak memory, without bringing the whole Envoy process down.
CPU and memory usage can also be constrained.


Security: The sandbox has a clearly defined API for communicating with Envoy, so extensions
only have access to, and can modify, a limited number of properties of a connection or request.
Furthermore, because Envoy mediates this interaction, it can hide or sanitize sensitive
information from the extension (e.g. “Authorization” and “Cookie” HTTP headers, or
the client’s IP address).


Flexibility: over 30 programming languages can be compiled to WebAssembly,
allowing developers from all backgrounds - C++, Go, Rust, Java, TypeScript, etc. - to write
Envoy extensions in their language of choice.


“I am extremely excited to see WASM support land in Envoy; this is the future of Envoy
extensibility, full stop. Envoy’s WASM support coupled with a community driven hub will unlock an
incredible amount of innovation in the networking space across both service mesh and API gateway
use cases. I can’t wait to see what the community builds moving forward.”
– Matt Klein, Envoy creator.
For technical details of the implementation, look out for an upcoming post
to the Envoy blog.
The Proxy-Wasm interface between host environment and extensions
is deliberately proxy agnostic. We’ve built it into Envoy, but it was designed to be adopted by
other proxy vendors. We want to see a world where you can take an extension written for Istio and
Envoy and run it in other infrastructure; you’ll hear more about that soon.
Building on WebAssembly in Istio
Istio moved several of its extensions into its build of Envoy as part of the 1.5 release, in order
to significantly improve performance. While doing that work we have been testing to ensure those
same extensions can compile and run as Proxy-Wasm modules with no variation in behavior. We’re not
quite ready to make this setup the default, given that we consider Wasm support to be Alpha;
however, this has given us a lot of confidence in our general approach and in the host
environment, ABI and SDKs that have been developed.
We have also been careful to ensure that the Istio control plane and
its Envoy configuration APIs are Wasm-ready.
We have samples to show how several common customizations such as custom header decoding or
programmatic routing can be performed which are common asks from users. As we move this support to
Beta, you will see documentation showing best practices for using Wasm with Istio.
Finally, we are working with the many vendors who have
written Mixer adapters,
to help them with a migration to Wasm — if that is the best path forward. Mixer will move to a
community project in a future release, where it will remain available for legacy use cases.
Developer Experience
Powerful tooling is nothing without a great developer experience. Solo.io
recently announced
the release of WebAssembly Hub, a set of tools and repository for
building, deploying, sharing and discovering Envoy Proxy Wasm extensions for Envoy and Istio.
The WebAssembly Hub fully automates many of the steps required for developing and deploying Wasm
extensions. Using WebAssembly Hub tooling, users can easily compile their code - in any supported
language - into Wasm extensions. The extensions can then be uploaded to the Hub registry, and be
deployed and undeployed to Istio with a single command.
Behind the scenes the Hub takes care of much of the nitty-gritty, such as pulling in the correct
toolchain, ABI version verification, permission control, and more. The workflow also eliminates
toil with configuration changes across Istio service proxies by automating the deployment of your
extensions. This tooling helps users and operators avoid unexpected behaviors due to
misconfiguration or version mismatches.
The WebAssembly Hub tools provide a powerful CLI as well as an elegant and easy-to-use graphical
user interface. An important goal of the WebAssembly Hub is to simplify the experience around
building Wasm modules and provide a place of collaboration for developers to share and discover
useful extensions.
Check out the getting started guide
to create your first Proxy-Wasm extension.
Next Steps
In addition to working towards a beta release, we are committed to making sure that there is a
durable community around Proxy-Wasm. The ABI needs to be finalized, and turning it into a standard
will be done with broader feedback within the appropriate standards body. Completing upstreaming
support into the Envoy mainline is still in progress. We are also seeking an appropriate
community home for the tooling and the WebAssembly Hub
Learn more


WebAssembly SF talk (video): Extensions for network proxies, by John Plevyak


Solo blog


Proxy-Wasm ABI specification


Proxy-Wasm C++ SDK and
its developer documentation


Proxy-Wasm Rust SDK


Proxy-Wasm AssemblyScript SDK


Tutorials


Videos on the Solo.io YouTube Channel





Istio in 2020 - Following the Trade Winds
Tue, 03 Mar 2020 00:00:00 +0000
Istio solves real problems that people encounter running microservices. Even
very early pre-release versions
helped users debug the latency in their architecture, increase the reliability
of services, and transparently secure traffic behind the firewall.
Last year, the Istio project experienced major growth. After a 9-month gestation
before the 1.1 release in Q1, we set a goal of having a quarterly release
cadence. We knew it was important to deliver value consistently and predictably.
With three releases landing in the successive quarters as planned, we are proud
to have reached that goal.
During that time, we improved our build and test infrastructure, resulting in
higher quality and easier release cycles. We doubled down on user experience,
adding many commands to make operating and debugging the mesh easier. We also
saw tremendous growth in the number of developers and companies contributing to
the product - culminating in us being
#4 on GitHub’s top ten list of fastest growing projects!
We have ambitious goals for Istio in 2020 and there are many major efforts
underway, but at the same time we strongly believe that good infrastructure
should be “boring.” Using Istio in production should be a seamless experience;
performance should not be a concern, upgrades should be a non-event and complex
tasks should be automated away. With our investment in a more powerful
extensibility story we think the pace of innovation in the service mesh space
can increase while Istio focuses on being gloriously dull.
More details on our major efforts in 2020 below.
Sleeker, smoother and faster
Istio provided for extensibility from day one, implemented by a component called
Mixer. Mixer is a platform that allows custom
adapters
to act as an intermediary between the data plane and the backends you use for
policy or telemetry. Mixer necessarily added overhead to requests because it
required extensions to be out-of-process. So, we’re moving to a model that
enables extension directly in the proxies instead.
Most of Mixer’s use cases for policy enforcement are already addressed with
Istio’s authentication
and authorization policies, which
allow you to control workload-to-workload and end-user-to-workload authorization
directly in the proxy. Common monitoring use cases have already moved into the
proxy too - we have
introduced in-proxy support
for sending telemetry to Prometheus and Stackdriver.
Our benchmarking shows that the new telemetry model reduces our latency
dramatically and gives us industry-leading performance, with 50% reductions in
both latency and CPU consumption.
A new model for Istio extensibility
The model that replaces Mixer uses extensions in Envoy to provide even more
capability. The Istio community is leading the implementation of a
WebAssembly (Wasm) runtime in Envoy, which lets us
implement extensions that are modular, sandboxed, and developed in one of
over 20 languages. Extensions
can be dynamically loaded and reloaded while the proxy continues serving
traffic. Wasm extensions will also be able to extend the platform in ways that
Mixer simply couldn’t.  They can act as custom protocol handlers and transform
payloads as they pass through Envoy —  in short they can do the same things as
modules built into Envoy.
We’re working with the Envoy community on ways to discover and distribute these
extensions. We want to make WebAssembly extensions as easy to install and run as
containers. Many of our partners have written Mixer adapters, and together we
are getting them ported to Wasm. We are also developing guides and codelabs on
how to write your own extensions for custom integrations.
By changing the extension model, we were also able to remove dozens of CRDs.
You no longer need a unique CRD for every piece of software you integrate with
Istio.
Installing Istio 1.5 with the ‘preview’ configuration profile won’t install
Mixer. If you upgrade from a previous release, or install the ‘default’ profile,
we still keep Mixer around, to be safe. When using Prometheus or Stackdriver for
metrics, we recommend you try out the new mode and see how much your performance
improves.
You can keep Mixer installed and enabled if you need it. Eventually Mixer will
become a separately released add-on to Istio that is part of the
istio-ecosystem.
Fewer moving parts
We are also simplifying the deployment of the rest of the control plane. To
that end, we combined several of the control plane components into a single
component: Istiod. This binary includes the features of Pilot, Citadel, Galley,
and the sidecar injector. This approach improves many aspects of installing and
managing Istio – reducing installation and configuration complexity,
maintenance effort, and issue diagnosis time while increasing responsiveness.
Read more about Istiod in
this post from Christian Posta.
We are shipping Istiod as the default for all profiles in 1.5.
To reduce the per-node footprint, we are getting rid of the node-agent, used to
distribute certificates, and moving its functionality to the istio-agent, which
already runs in each Pod. For those of you who like pictures we are moving from
this …

    
        
            
        
    
    The Istio architecture today

to this…

    
        
            
        
    
    The Istio architecture in 2020

In 2020, we will continue to invest in onboarding to achieve our goal of a
“zero config” default that doesn’t require you to change any of your
application’s configuration to take advantage of most Istio features.
Improved lifecycle management
To improve Istio’s life-cycle management, we moved to an
operator-based
installation. We introduced the
IstioOperator CRD and two installation modes:

Human-triggered: use istioctl to apply the settings to the cluster.
Machine-triggered: use a controller that is continually watching for changes
in that CRD and affecting those in real time.

In 2020 upgrades will getting easier too.  We will add support for “canarying”
new versions of the Istio control plane, which allows you to run a new version
alongside the existing version and gradually switch your data plane over to use
the new one.
Secure By Default
Istio already provides the fundamentals for strong service security: reliable
workload identity, robust access policies and comprehensive audit logging. We’re
stabilizing APIs for these features; many Alpha APIs are moving to Beta in 1.5,
and we expect them to all be v1 by the end of 2020. To learn more about the
status of our APIs, see our
features page.
Network traffic is also becoming more secure by default. After many users
enabled it in preview,
automated rollout of mutual TLS
is becoming the recommended practice in Istio 1.5.
In addition we will make Istio require fewer privileges and simplify its
dependencies which in turn make it a more robust system. Historically, you had
to mount certificates into Envoy using Kubernetes Secrets, which were mounted as
files into each proxy.  By leveraging the
Secret Discovery Service
we can distribute these certificates securely without concern of them being
intercepted by other workloads on the machine. This mode will become the default
in 1.5.
Getting rid of the node-agent not only simplifies the deployment, but also
removes the requirement for a cluster-wide PodSecurityPolicy, further
improving the security posture of your cluster.
Other features
Here’s a snapshot of some more exciting things you can expect from Istio in
2020:

Integration with more hosted Kubernetes environments - service meshes
powered by Istio are currently available from 15 vendors, including Google, IBM,
Red Hat, VMware, Alibaba and Huawei
More investment in istioctl and its ability to help diagnose problems
Better integration of VM-based workloads into meshes
Continued work towards making multi-cluster and multi-network meshes easier to
configure, maintain, and run
Integration with more service discovery systems, including
Functions-as-a-Service
Implementation of the new
Kubernetes service APIs,
which are currently in development
An enhancement repository,
to track feature development
Making it easier to run Istio without needing Kubernetes!

From the seas to the skies,
we’re excited to see where you take Istio next.



Remove cross-pod unix domain sockets
Thu, 20 Feb 2020 00:00:00 +0000
In Istio versions before 1.5, during secret discovery service (SDS) execution,
the SDS client and the SDS server communicate through a cross-pod Unix domain
socket (UDS), which needs to be protected by Kubernetes pod security policies.
With Istio 1.5, Pilot Agent, Envoy, and Citadel Agent will be running in
the same container (the architecture is shown in the following diagram).
To defend against attackers eavesdropping on the cross-pod UDS between Envoy (SDS client)
and Citadel Agent (SDS server), Istio 1.5 merges Pilot Agent and Citadel Agent
into a single Istio Agent and makes the UDS between Envoy and Citadel Agent
private to the Istio Agent container.
The Istio Agent container is deployed as the sidecar of the application service container.

    
        
            
        
    
    The architecture of Istio Agent




Multicluster Istio configuration and service discovery using Admiral
Sun, 05 Jan 2020 00:00:00 +0000
At Intuit, we read the blog post Multi-Mesh Deployments for Isolation and Boundary Protection and immediately related to some of the problems mentioned.
We realized that even though we wanted to configure a single multi-cluster mesh, instead of a federation of multiple meshes
as described in the blog post, the same non-uniform naming issues also applied in our environment.
This blog post explains how we solved these problems using Admiral, an open source project under istio-ecosystem in GitHub.
Background
Using Istio, we realized the configuration for multi-cluster was complex and challenging to maintain over time. As a result, we chose the model described in Multi-Cluster Istio Service Mesh with replicated control planes for scalability and other operational reasons. Following this model, we had to solve these key requirements before widely adopting an Istio service mesh:

Creation of service DNS entries decoupled from the namespace, as described in Features of multi-mesh deployments.
Service discovery across many clusters.
Supporting active-active & HA/DR deployments. We also had to support these crucial resiliency patterns with services being deployed in globally unique namespaces across discrete clusters.

We have over 160 Kubernetes clusters with a globally unique namespace name across all clusters. In this configuration, we can have the same service workload deployed in different regions running in namespaces with different names. As a result, following the routing strategy mentioned in Multicluster version routing, the example name foo.namespace.global wouldn’t work across clusters. We needed a globally unique and discoverable service DNS that resolves service instances in multiple clusters, each instance running/addressable with its own unique Kubernetes FQDN. For example, foo.global should resolve to both foo.uswest2.svc.cluster.local & foo.useast2.svc.cluster.local if foo is running in two Kubernetes clusters with different names.
Also, our services need additional DNS names with different resolution and global routing properties. For example, foo.global should resolve locally first, then route to a remote instance using topology routing, while foo-west.global and foo-east.global (names used for testing) should always resolve to the respective regions.
Contextual Configuration
After further investigation, it was apparent that configuration needed to be contextual: each cluster needs a configuration specifically tailored for its view of the world.
For example, we have a payments service consumed by orders and reports. The payments service has a HA/DR deployment across us-east (cluster 3) and us-west (cluster 2). The payments service is deployed in namespaces with different names in each region. The orders service is deployed in a different cluster as payments in us-west (cluster 1). The reports service is deployed in the same cluster as payments in us-west (cluster 2).

    
        
            
        
    
    Cross cluster workload communication with Istio

Istio ServiceEntry yaml for payments service in Cluster 1 and Cluster 2 below illustrates the contextual configuration that other services need to use the payments service:
Cluster 1 Service Entry
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: payments.global-se
spec:
  addresses:
  - 240.0.0.10
  endpoints:
  - address: ef394f...us-east-2.elb.amazonaws.com
    locality: us-east-2
    ports:
      http: 15443
  - address: ad38bc...us-west-2.elb.amazonaws.com
    locality: us-west-2
    ports:
      http: 15443
  hosts:
  - payments.global
  location: MESH_INTERNAL
  ports:
  - name: http
    number: 80
    protocol: http
  resolution: DNS
Cluster 2 Service Entry
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: payments.global-se
spec:
  addresses:
  - 240.0.0.10
  endpoints:
  - address: ef39xf...us-east-2.elb.amazonaws.com
    locality: us-east-2
    ports:
      http: 15443
  - address: payments.default.svc.cluster.local
    locality: us-west-2
    ports:
      http: 80
  hosts:
  - payments.global
  location: MESH_INTERNAL
  ports:
  - name: http
    number: 80
    protocol: http
  resolution: DNS
The payments ServiceEntry (Istio CRD) from the point of view of the reports service in Cluster 2, would set the locality us-west pointing to the local Kubernetes FQDN and locality us-east pointing to the istio-ingressgateway (load balancer) for Cluster 3.
The payments ServiceEntry from the point of view of the orders service in Cluster 1, will set the locality us-west pointing to Cluster 2 istio-ingressgateway and locality us-east pointing to the istio-ingressgateway for Cluster 3.
But wait, there’s even more complexity: What if the payment services want to move traffic to the us-east region for a planned maintenance in us-west? This would require the payments service to change the Istio configuration in all of their clients’ clusters. This would be nearly impossible to do without automation.
Admiral to the Rescue: Admiral is that Automation
Admiral is a controller of Istio control planes.

    
        
            
        
    
    Cross cluster workload communication with Istio and Admiral

Admiral provides automatic configuration for an Istio mesh spanning multiple clusters to work as a single mesh based on a unique service identifier that associates workloads running on multiple clusters to a service. It also provides automatic provisioning and syncing of Istio configuration across clusters. This removes the burden on developers and mesh operators, which helps scale beyond a few clusters.
Admiral CRDs
Global Traffic Routing
With Admiral’s global traffic policy CRD, the payments service can update regional traffic weights and Admiral updates the Istio configuration in all clusters that consume the payments service.
apiVersion: admiral.io/v1alpha1
kind: GlobalTrafficPolicy
metadata:
  name: payments-gtp
spec:
  selector:
    identity: payments
  policy:
  - dns: default.payments.global
    lbType: 1
    target:
    - region: us-west-2/*
      weight: 10
    - region: us-east-2/*
      weight: 90
In the example above, 90% of the payments service traffic is routed to the us-east region. This Global Traffic Configuration is automatically converted into Istio configuration and contextually mapped into Kubernetes clusters to enable multi-cluster global routing for the payments service for its clients within the Mesh.
This Global Traffic Routing feature relies on Istio’s locality load-balancing per service available in Istio 1.5 or later.
Dependency
The Admiral Dependency CRD allows us to specify a service’s dependencies based on a service identifier. This optimizes the delivery of Admiral generated configuration only to the required clusters where the dependent clients of a service are running (instead of writing it to all clusters). Admiral also configures and/or updates the Sidecar Istio CRD in the client’s workload namespace to limit the Istio configuration to only its dependencies. We use service-to-service authorization information recorded elsewhere to generate this dependency records for Admiral to use.
An example dependency for the orders service:
apiVersion: admiral.io/v1alpha1
kind: Dependency
metadata:
  name: dependency
  namespace: admiral
spec:
  source: orders
  identityLabel: identity
  destinations:
  - payments
Dependency is optional and a missing dependency for a service will result in an Istio configuration for that service pushed to all clusters.
Summary
Admiral provides a new Global Traffic Routing and unique service naming functionality to address some challenges posed by the Istio model described in multi-cluster deployment with replicated control planes. It removes the need for manual configuration synchronization between clusters and generates contextual configuration for each cluster. This makes it possible to operate a Service Mesh composed of many Kubernetes clusters.
We think Istio/Service Mesh community would benefit from this approach, so we open sourced Admiral and would love your feedback and support!



Secure Webhook Management
Thu, 14 Nov 2019 00:00:00 +0000
Istio has two webhooks: Galley and the sidecar injector.
Galley validates Kubernetes resources and the sidecar injector injects sidecar
containers into Istio.
By default, Galley and the sidecar injector manage their own webhook configurations.
This can pose a security risk if they are compromised, for example, through buffer overflow attacks.
Configuring a webhook is a highly privileged operation as a webhook may monitor and mutate all
Kubernetes resources.
In the following example, the attacker compromises
Galley and modifies the webhook configuration of Galley to eavesdrop on all Kubernetes secrets
(the clientConfig is modified by the attacker to direct the secrets resources to
a service owned by the attacker).

    
        
            
        
    
    An example attack

To protect against this kind of attack, Istio 1.4 introduces a new feature to securely manage
webhooks using istioctl:


istioctl, instead of Galley and the sidecar injector, manage the webhook configurations.
Galley and the sidecar injector are de-privileged so even if they are compromised, they
will not be able to alter the webhook configurations.


Before configuring a webhook, istioctl will verify the webhook server is up
and that the certificate chain used by the webhook server is valid. This reduces the errors
that can occur before a server is ready or if a server has invalid certificates.


To try this new feature, refer to the Istio webhook management task.



Introducing the Istio v1beta1 Authorization Policy
Thu, 14 Nov 2019 00:00:00 +0000
Istio 1.4 introduces the
v1beta1 authorization policy,
which is a major update to the previous v1alpha1 role-based access control
(RBAC) policy. The new policy provides these improvements:

Aligns with Istio configuration model.
Improves the user experience by simplifying the API.
Supports more use cases (e.g. Ingress/Egress gateway support) without
added complexity.

The v1beta1 policy is not backward compatible and requires a one time
conversion. A tool is provided to automate this process. The previous
configuration resources ClusterRbacConfig, ServiceRole, and
ServiceRoleBinding will not be supported from Istio 1.6 onwards.
This post describes the new v1beta1 authorization policy model, its
design goals and the migration from v1alpha1 RBAC policies. See the
authorization concept page
for a detailed in-depth explanation of the v1beta1 authorization policy.
We welcome your feedback about the v1beta1 authorization policy at
discuss.istio.io.
Background
To date, Istio provided RBAC policies to enforce access control on
services using three configuration
resources: ClusterRbacConfig, ServiceRole and ServiceRoleBinding.
With this API, users have been able to enforce control access at mesh-level,
namespace-level and service-level. Like other RBAC policies, Istio RBAC uses
the same concept of role and binding for granting permissions to identities.
Although Istio RBAC has been working reliably, we’ve found that many
improvements were possible.
For example, users have mistakenly assumed that access control enforcement
happens at service-level because ServiceRole uses service to specify where
to apply the policy, however, the policy is actually applied on
workloads, the service is only used to
find the corresponding workload. This nuance is significant when multiple
services are referring to the same workload. A ServiceRole for service A
will also affect service B if the two services are referring to the same
workload, which can cause confusion and incorrect configuration.
An other example is that it’s proven difficult for users to maintain and
manage the Istio RBAC configurations because of the need to deeply understand
three related resources.
Design goals
The new v1beta1 authorization policy had several design goals:


Align with Istio Configuration Model for better
clarity on the policy target. The configuration model provides a unified
configuration hierarchy, resolution and target selection.


Improve the user experience by simplifying the API. It’s easier to manage
one custom resource definition (CRD) that includes all access control
specifications, instead of multiple CRDs.


Support more use cases without added complexity. For example, allow the
policy to be applied on Ingress/Egress gateway to enforce access control
for traffic entering/exiting the mesh.


AuthorizationPolicy
An AuthorizationPolicy custom resource
enables access control on workloads. This section gives an overview of the
changes in the v1beta1 authorization policy.
An AuthorizationPolicy includes a selector and a list of rule.
The selector specifies on which workload to apply the policy and the
list of rule specifies the detailed access control rule for the workload.
The rule is additive, which means a request is allowed if any rule
allows the request. Each rule includes a list of from, to and
when, which specifies who is allowed to do what under which
conditions.
The selector replaces the functionality provided by ClusterRbacConfig
and the services field in ServiceRole. The rule replaces the other
fields in the ServiceRole and ServiceRoleBinding.
Example
The following authorization policy applies to workloads with app: httpbin
and version: v1 label in the foo namespace:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: foo
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   to:
   - operation:
       methods: ["GET"]
   when:
   - key: request.headers[version]
     values: ["v1", "v2"]
The policy allows principal cluster.local/ns/default/sa/sleep to access the
workload using the GET method when the request includes a version header
of value v1 or v2. Any requests not matched with the policy will be denied
by default.
Assuming the httpbin service is defined as:
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: foo
spec:
  selector:
    app: httpbin
    version: v1
  ports:
    # omitted
You would need to configure three resources to achieve the same result in
v1alpha1:
apiVersion: "rbac.istio.io/v1alpha1"
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: 'ON_WITH_INCLUSION'
  inclusion:
    services: ["httpbin.foo.svc.cluster.local"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: httpbin
  namespace: foo
spec:
  rules:
  - services: ["httpbin.foo.svc.cluster.local"]
    methods: ["GET"]
    constraints:
    - key: request.headers[version]
      values: ["v1", "v2"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: httpbin
  namespace: foo
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/sleep"
  roleRef:
    kind: ServiceRole
    name: "httpbin"
Workload selector
A major change in the v1beta1 authorization policy is that it now uses
workload selector to specify where to apply the policy. This is the same
workload selector used in the Gateway, Sidecar and EnvoyFilter
configurations.
The workload selector makes it clear that the policy is applied and enforced
on workloads instead of services. If a policy applies to a workload that is
used by multiple different services, the same policy will affect the traffic
to all the different services.
You can simply leave the selector empty to apply the policy to all
workloads in a namespace. The following policy applies to all workloads in
the namespace bar:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: policy
 namespace: bar
spec:
 rules:
 # omitted
Root namespace
A policy in the root namespace applies to all workloads in the mesh in every
namespaces. The root namespace is configurable in the
MeshConfig
and has the default value of istio-system.
For example, you installed Istio in istio-system namespace and deployed
workloads in default and bookinfo namespace. The root namespace is
changed to istio-config from the default value. The following policy will
apply to workloads in every namespace including default, bookinfo and
the istio-system:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: policy
 namespace: istio-config
spec:
 rules:
 # omitted
Ingress/Egress Gateway support
The v1beta1 authorization policy can also be applied on ingress/egress
gateway to enforce access control on traffic entering/leaving the mesh,
you only need to change the selector to make select the ingress/egress
workload.
The following policy applies to workloads with the
app: istio-ingressgateway label:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: ingress
 namespace: istio-system
spec:
 selector:
   matchLabels:
     app: istio-ingressgateway
 rules:
 # omitted
Remember the authorization policy only applies to workloads in the same
namespace as the policy, unless the policy is applied in the root namespace:


If you don’t change the default root namespace value (i.e. istio-system),
the above policy will apply to workloads with the app: istio-ingressgateway
label in every namespace.


If you have changed the root namespace to a different value, the above
policy will only apply to workloads with the app: istio-ingressgateway
label only in the istio-system namespace.


Comparison
The following table highlights the key differences between the old v1alpha1
RBAC policies and the new v1beta1 authorization policy.
Feature

  
      
          Feature
          v1alpha1 RBAC policy
          v1beta1 Authorization Policy
      
  
  
      
          API stability
          alpha: No backward compatible
          beta: backward compatible guaranteed
      
      
          Number of CRDs
          Three: ClusterRbacConfig, ServiceRole and ServiceRoleBinding
          Only One: AuthorizationPolicy
      
      
          Policy target
          service
          workload
      
      
          Deny-by-default behavior
          Enabled explicitly by configuring ClusterRbacConfig
          Enabled implicitly with AuthorizationPolicy
      
      
          Ingress/Egress gateway support
          Not supported
          Supported
      
      
          The "*" value in policy
          Match all contents (empty and non-empty)
          Match non-empty contents only
      
  

The following tables show the relationship between the v1alpha1 and v1beta1 API.
ClusterRbacConfig

  
      
          ClusterRbacConfig.Mode
          AuthorizationPolicy
      
  
  
      
          OFF
          No policy applied
      
      
          ON
          A deny-all policy applied in root namespace
      
      
          ON_WITH_INCLUSION
          policies should be applied to namespaces or workloads included by ClusterRbacConfig
      
      
          ON_WITH_EXCLUSION
          policies should be applied to namespaces or workloads excluded by ClusterRbacConfig
      
  

ServiceRole

  
      
          ServiceRole
          AuthorizationPolicy
      
  
  
      
          services
          selector
      
      
          paths
          paths in to
      
      
          methods
          methods in to
      
      
          destination.ip in constraint
          Not supported
      
      
          destination.port in constraint
          ports in to
      
      
          destination.labels in constraint
          selector
      
      
          destination.namespace in constraint
          Replaced by the namespace of the policy, i.e. the namespace in metadata
      
      
          destination.user in constraint
          Not supported
      
      
          experimental.envoy.filters in constraint
          experimental.envoy.filters in when
      
      
          request.headers in constraint
          request.headers in when
      
  

ServiceRoleBinding

  
      
          ServiceRoleBinding
          AuthorizationPolicy
      
  
  
      
          user
          principals in from
      
      
          group
          request.auth.claims[group] in when
      
      
          source.ip in property
          ipBlocks in from
      
      
          source.namespace in property
          namespaces in from
      
      
          source.principal in property
          principals in from
      
      
          request.headers in property
          request.headers in when
      
      
          request.auth.principal in property
          requestPrincipals in from or request.auth.principal in when
      
      
          request.auth.audiences in property
          request.auth.audiences in when
      
      
          request.auth.presenter in property
          request.auth.presenter in when
      
      
          request.auth.claims in property
          request.auth.claims in when
      
  

Beyond all the differences, the v1beta1 policy is enforced by the same
engine in Envoy and supports the same authenticated identity (mutual TLS or
JWT), condition and other primitives (e.g. IP, port and etc.) as the
v1alpha1 policy.
Future of the v1alpha1 policy
The v1alpha1 RBAC policy (ClusterRbacConfig, ServiceRole, and
ServiceRoleBinding) is deprecated by the v1beta1 authorization policy.
Istio 1.4 continues to support the v1alpha1 RBAC policy to give you
enough time to move away from the alpha policies.
Migration from the v1alpha1 policy
Istio only supports one of the two versions for a given workload:

If there is only v1beta1 policy for a workload, the v1beta1 policy
will be used.
If there is only v1alpha1 policy for a workload, the v1alpha1 policy
will be used.
If there are both v1beta1 and v1alpha1 policies for a workload,
only the v1beta1 policy will be used and the v1alpha1 policy
will be ignored.

General Guideline

    
        
            
        
        When migrating to use v1beta1 policy for a given workload, make sure the
new v1beta1 policy covers all the existing v1alpha1 policies applied
for the workload, because the v1alpha1 policies applied for the workload
will be ignored after you applied the v1beta1 policies.
    


The typical flow of migrating to v1beta1 policy is to start by checking the
ClusterRbacConfig to decide which namespace or service is enabled with RBAC.
For each service enabled with RBAC:

Get the workload selector from the service definition.
Create a v1beta1 policy with the workload selector.
Update the v1beta1 policy for each ServiceRole and ServiceRoleBinding
applied to the service.
Apply the v1beta1 policy and monitor the traffic to make sure the
policy is working as expected.
Repeat the process for the next service enabled with RBAC.

For each namespace enabled with RBAC:

Apply a v1beta1 policy that denies all traffic to the given namespace.

Migration Example
Assume you have the following v1alpha1 policies for the httpbin service
in the foo namespace:
apiVersion: "rbac.istio.io/v1alpha1"
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: 'ON_WITH_INCLUSION'
  inclusion:
    namespaces: ["foo"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: httpbin
  namespace: foo
spec:
  rules:
  - services: ["httpbin.foo.svc.cluster.local"]
    methods: ["GET"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: httpbin
  namespace: foo
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/sleep"
  roleRef:
    kind: ServiceRole
    name: "httpbin"
Migrate the above policies to v1beta1 in the following ways:


Assume the httpbin service has the following workload selector:
selector:
  app: httpbin
  version: v1


Create a v1beta1 policy with the workload selector:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: foo
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1


Update the v1beta1 policy with each ServiceRole and ServiceRoleBinding
applied to the service:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: foo
spec:
 selector:
   matchLabels:
     app: httpbin
     version: v1
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   to:
   - operation:
       methods: ["GET"]


Apply the v1beta1 policy and monitor the traffic to make sure it works
as expected.


Apply the following v1beta1 policy that denies all traffic to the
foo namespace because the foo namespace is enabled with RBAC:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: deny-all
 namespace: foo
spec:
 {}


Make sure the v1beta1 policy is working as expected and then you can delete
the v1alpha1 policies from the cluster.
Automation of the Migration
To help ease the migration, the istioctl experimental authz convert
command is provided to automatically convert the v1alpha1 policies to
the v1beta1 policy.
You can evaluate the command but it is experimental in Istio 1.4 and doesn’t
support the full v1alpha1 semantics as of the date of this blog post.
The command to support the full v1alpha1 semantics is expected in a patch
release following Istio 1.4.



Introducing the Istio Operator
Thu, 14 Nov 2019 00:00:00 +0000
Kubernetes operators provide
a pattern for encoding human operational knowledge in software and are a popular way to simplify
the administration of software infrastructure components. Istio is a natural candidate for an automated
operator as it is challenging to administer.
Up until now, Helm has been the primary tool to install and upgrade Istio.
Istio 1.4 introduces a new method of installation using istioctl.
This new installation method builds on the strengths of Helm with the addition of the
following:

Users only need to install one tool: istioctl
All API fields are validated
Small customizations not in the API don’t require chart or API changes
Version specific upgrade hooks can be easily and robustly implemented

The Helm installation method is in the process of deprecation. Upgrading from Istio
1.4 with a version not initially installed with Helm will also be replaced by a new
istioctl upgrade feature.
The new istioctl installation commands use a
custom resource
to configure the installation. The custom resource is part of a new Istio operator
implementation intended to simplify the common administrative tasks of installation, upgrade,
and complex configuration changes for Istio. Validation and checking for installation and upgrade
is tightly integrated with the tools to prevent common errors and simplify troubleshooting.
The Operator API
Every operator implementation requires a
custom resource definition (CRD)
to define its custom resource, that is, its API. Istio’s operator API is defined by the
IstioControlPlane CRD,
which is generated from an
IstioControlPlane proto.
The API supports all of Istio’s current configuration profiles
using a single field to select the profile. For example, the following IstioControlPlane resource
configures Istio using the demo profile:
apiVersion: install.istio.io/v1alpha2
kind: IstioControlPlane
metadata:
  namespace: istio-operator
  name: example-istiocontrolplane
spec:
  profile: demo
You can then customize the configuration with additional settings. For example, to disable telemetry:
apiVersion: install.istio.io/v1alpha2
kind: IstioControlPlane
metadata:
  namespace: istio-operator
  name: example-istiocontrolplane
spec:
  profile: demo
  telemetry:
    enabled: false
Installing with istioctl
The recommended way to use the Istio operator API is through a new set of istioctl commands.
For example, to install Istio into a cluster:
$ istioctl manifest apply -f 
Make changes to the installation configuration by editing the configuration file and executing
istioctl manifest apply again.
To upgrade to a new version of Istio:
$ istioctl x upgrade -f 
In addition to specifying the complete configuration in an IstioControlPlane resource,
the istioctl commands can also be passed individual settings using a --set flag:
$ istioctl manifest apply --set telemetry.enabled=false
There are also a number of other istioctl commands that, for example, help you list, display,
and compare configuration profiles and manifests.
Refer to the Istio install instructions for more details.
Istio Controller (alpha)
Operator implementations use a Kubernetes controller to continuously monitor their custom resource
and apply the corresponding configuration changes. The Istio controller monitors an IstioControlPlane
resource and reacts to changes by updating the Istio installation configuration in the corresponding cluster.
In the 1.4 release, the Istio controller is in the alpha phase of development and not fully
integrated with istioctl. It is, however,
available for experimentation using kubectl commands.
For example, to install the controller and a default version of Istio into your cluster,
run the following command:
$ kubectl apply -f https:///operator.yaml
$ kubectl apply -f https:///default-cr.yaml
You can then make changes to the Istio installation configuration:
$ kubectl edit istiocontrolplane example-istiocontrolplane -n istio-system
As soon as the resource is updated, the controller will detect the changes and respond by updating
the Istio installation correspondingly.
Both the operator controller and istioctl commands share the same implementation. The significant
difference is the execution context. In the istioctl case, the operation runs in the admin user’s
command execution and security context. In the controller case, a pod in the cluster runs the code
in its security context. In both cases, configuration is validated against a schema and the same correctness
checks are performed.
Migration from Helm
To help ease the transition from previous configurations using Helm,
istioctl and the controller support pass-through access for the full Helm installation API.
You can pass Helm configuration options using istioctl --set by prepending the string values. to the option name.
For example, instead of this Helm command:
$ helm template ... --set global.mtls.enabled=true
You can use this istioctl command:
$ istioctl manifest generate ... --set values.global.mtls.enabled=true
You can also set Helm configuration values in an IstioControlPlane custom resource.
See Customize Istio settings using Helm
for details.
Another feature to help with the transition from Helm is the alpha
istioctl manifest migrate command.
This command can be used to automatically convert a Helm values.yaml file to a corresponding
IstioControlPlane configuration.
Implementation
Several frameworks have been created to help implement operators by generating stubs for some or all of
the components. The Istio operator was created with the help of a combination of
kubebuilder and
operator framework. Istio’s installation now uses a proto to
describe the API such that runtime validation can be executed against a schema.
More information about the implementation can be found in the README and ARCHITECTURE documents
in the Istio operator repository.
Summary
Starting in Istio 1.4, Helm installation is being replaced by new istioctl commands using
a new operator custom resource definition, IstioControlPlane, for the configuration API.
An alpha controller is also available for early experimentation with the operator.
The new istioctl commands and operator controller both validate configuration schemas and perform a range of
checks for installation change or upgrade. These checks are tightly integrated with the tools to prevent
common errors and simplify troubleshooting.
The Istio maintainers expect that this new approach will improve the user experience during Istio
installation and upgrade, better stabilize the installation API, and help users better manage and
monitor their Istio installations.
We welcome your feedback about the new installation approach at discuss.istio.io.



Introducing istioctl analyze
Thu, 14 Nov 2019 00:00:00 +0000
Istio 1.4 introduces an experimental new tool to help you analyze and debug your clusters running Istio.
istioctl analyze is a diagnostic tool that detects potential issues with your
Istio configuration, as well as gives general insights to improve your configuration.
It can run against a live cluster or a set of local configuration files.
It can also run against a combination of the two, allowing you to catch problems before you
apply changes to a cluster.
To get started with it in just minutes, head over to the documentation.
Designed to be approachable for novice users
One of the key design goals that we followed for this feature is to make it extremely approachable.
This is achieved by making the command useful without having to pass any required complex parameters.
In practice, here are some of the scenarios that it goes after:

“There is some problem with my cluster, but I have no idea where to start”
“Things are generally working, but I’m wondering if there is anything I could improve”

In that sense, it is very different from some of the more advanced diagnostic tools, which go
after scenarios along the lines of (taking istioctl proxy-config as an example):

“Show me the Envoy configuration for this specific pod so I can see if anything looks wrong”

This can be very useful for advanced debugging, but it requires a lot of expertize before you
can figure out that you need to run this specific command, and which pod to run it on.
So really, the one-line pitch for analyze is: just run it! It’s completely safe, it takes no thinking,
it might help you, and at worst, you’ll have wasted a minute!
Improving this tool over time
In Istio 1.4, analyze comes with a nice set of analyzers that can detect a number of common issues.
But this is just the beginning, and we are planning to keep growing and fine tuning the analyzers with
each release.
In fact, we would welcome suggestions from Istio users. Specifically, if you encounter a situation
where you think an issue could be detected via configuration analysis, but is not currently flagged
by analyze, please do let us know. The best way to do this is to open an issue on GitHub.



DNS Certificate Management
Thu, 14 Nov 2019 00:00:00 +0000
By default, Citadel manages the DNS certificates of the Istio control plane.
Citadel is a large component that maintains its own private signing key, and acts as a Certificate Authority (CA).
New in Istio 1.4, we introduce a feature to securely provision and manage DNS certificates
signed by the Kubernetes CA, which has the following advantages.


Lighter weight DNS certificate management with no dependency on Citadel.


Unlike Citadel, this feature doesn’t maintain a private signing key, which enhances security.


Simplified root certificate distribution to TLS clients.
Clients no longer need to wait for Citadel to generate and distribute its CA certificate.


The following diagram shows the architecture of provisioning and managing DNS certificates in Istio.
Chiron is the component provisioning and managing DNS certificates in Istio.

    
        
            
        
    
    The architecture of provisioning and managing DNS certificates in Istio

To try this new feature, refer to the DNS certificate management task.



Announcing Istio client-go
Thu, 14 Nov 2019 00:00:00 +0000
We are pleased to announce the initial release of the Istio
client go repository which enables developers
to gain programmatic access to Istio APIs in a Kubernetes environment. The
generated Kubernetes informers and client set in this repository makes it easy
for developers to create controllers and perform Create, Read, Update and Delete
(CRUD) operations for all Istio Custom Resource Definitions (CRDs).
This was a highly requested functionality by many Istio users, as is evident
from the feature requests on the clients generated by Aspen Mesh
and the Knative project.
If you’re currently using one of the above mentioned clients, you can easily
switch to using Istio client go like
this:
import (
  ...
  - versionedclient "github.com/aspenmesh/istio-client-go/pkg/client/clientset/versioned"
  + versionedclient "istio.io/client-go/pkg/clientset/versioned"
)
As the generated client sets are functionally equivalent, switching the imported
client libraries should be sufficient in order to consume the newly
generated library.
How to use client-go
The Istio client go repository follows the
same branching strategy as the Istio API
repository, as the client repository depends on the API definitions. If you want
to use a stable client set, you can use the release branches or tagged versions
in the client go repository.
Using the client set is very similar to using the Kubernetes client
go, here’s a quick example of using
the client to list all Istio
virtual services
in the passed namespace:
package main

import (
  "log"
  "os"

  metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
  "k8s.io/client-go/tools/clientcmd"

  versionedclient "istio.io/client-go/pkg/clientset/versioned"
)

func main() {
  kubeconfig := os.Getenv("KUBECONFIG")
  namespace := os.Getenv("NAMESPACE")
  if len(kubeconfig) == 0 || len(namespace) == 0 {
    log.Fatalf("Environment variables KUBECONFIG and NAMESPACE need to be set")
  }
  restConfig, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
  if err != nil {
    log.Fatalf("Failed to create k8s rest client: %s", err)
  }

  ic, err := versionedclient.NewForConfig(restConfig)
  if err != nil {
    log.Fatalf("Failed to create istio client: %s", err)
  }
  // Print all VirtualServices
  vsList, err := ic.NetworkingV1alpha3().VirtualServices(namespace).List(metav1.ListOptions{})
  if err != nil {
    log.Fatalf("Failed to get VirtualService in %s namespace: %s", namespace, err)
  }
  for i := range vsList.Items {
    vs := vsList.Items[i]
    log.Printf("Index: %d VirtualService Hosts: %+v\n", i, vs.Spec.GetHosts())
  }
}
You can find a more in-depth example here.
Useful tools created for generating Istio client-go
If you’re wondering why it took so long or why was it difficult to generate
this client set, this section is for you. In Istio, we use
protobuf specifications to
write APIs which are then converted to Go definitions
using the protobuf tool chain. There are three major challenges which you might
face if you’re trying to generate Kubernetes client set from a protobuf-generated API:


Creating Kubernetes Wrapper Types - Kubernetes client generation
library only works for Go objects which follow the Kubernetes object
specification for e.g. Authentication Policy Kubernetes Wrappers.
This means for every API which needs programmatic access, you need to create
these wrappers. Additionally, there is a fair amount of boilerplate needed for
every CRD group, version and kind that needs client code generation.
To automate this process, we created a Kubernetes type
generator tool
which can automatically create the Kubernetes types based on annotations.
The annotations parsed by this tool and the various available options
are explained in the README.
Note that if you’re using protobuf tools to generate Go types, you would need to
add these annotations as comments in the proto files, so that the comments are
present in the generated Go files which are then used by this tool.


Generating deep copy methods - In Kubernetes client machinery, if you want to
mutate any object returned from the client set, you are required to make a copy
of the object to prevent modifying the object in-place in the cache store. The
canonical way to do this is to create a deepcopy method on all nested types.
We created a tool protoc deep copy
generator
which is a protoc plugin and can automatically create deepcopy method
based on annotations using the Proto library utility Proto
Clone. Here’s an
example
of the generated deepcopy method.


Marshaling and Unmarshaling types to/from JSON - For the types generated
from proto definitions, it is often problematic to use the default Go JSON
encoder/decoder as there are various fields like protobuf’s oneof which requires
special handling. Additionally, any Proto fields with underscores in their
name might serialize/deserialize to different field names depending on the
encoder/decoder as the Go struct tag are generated
differently.
It is always recommended to use protobuf primitives for
serializing/deserializing to JSON instead of relying on default Go
library. We created a tool protoc JSON
shim which
is a protoc plugin and can automatically create Marshalers/Unmarshalers for
all Go type generated from Proto definitions. Here’s an
example
of the code generated by this tool.


I’m hoping that the newly released client library enables users to create more
integrations and controllers for the Istio APIs, and the tools mentioned above
can be used by developers to generate Kubernetes client set from Proto APIs.



Istio as a Proxy for External Services
Tue, 15 Oct 2019 00:00:00 +0000
The Control Ingress Traffic and the
Ingress Gateway without TLS Termination tasks describe
how to configure an ingress gateway to expose services inside the mesh to external traffic. The services can be HTTP or
HTTPS. In the case of HTTPS, the gateway passes the traffic through, without terminating TLS.
This blog post describes how to use the same ingress gateway mechanism of Istio to enable access to external services and
not to applications inside the mesh. This way Istio as a whole can serve just as a proxy server, with the added value of
observability, traffic management and policy enforcement.
The blog post shows configuring access to an HTTP and an HTTPS external service, namely httpbin.org and
edition.cnn.com.
Configure an ingress gateway


Define an ingress gateway with a servers: section configuring the 80 and 443 ports.
Ensure mode: is set to PASSTHROUGH for tls: in the port 443, which instructs the gateway to pass the
ingress traffic AS IS, without terminating TLS.
$ kubectl apply -f - <



Create service entries for the httpbin.org and edition.cnn.com services to make them accessible from the ingress
gateway:
$ kubectl apply -f - <



Create a service entry and configure a destination rule for the localhost service.
You need this service entry in the next step as a destination for traffic to the external services from
applications inside the mesh to block traffic from inside the mesh. In this example you use Istio as a proxy between
external applications and external services.
$ kubectl apply -f - <



Create a virtual service for each external service to configure routing to it. Both virtual services include the
proxy gateway in the gateways: section and in the match: section for HTTP and HTTPS traffic accordingly.
Notice the route: section for the mesh gateway, the gateway that represents the applications inside
the mesh. The route: for the mesh gateway shows how the traffic is directed to the localhost.local service,
effectively blocking the traffic.
$ kubectl apply -f - <



Enable Envoy’s access logging.


Follow the instructions in
Determining the ingress IP and ports
to define the SECURE_INGRESS_PORT and INGRESS_HOST environment variables.


Access the httpbin.org service through your ingress IP and port which you stored in the
$INGRESS_HOST and $INGRESS_PORT environment variables, respectively, during the previous step.
Access the /status/418 path of the httpbin.org service that returns the HTTP status
418 I’m a teapot.
$ curl $INGRESS_HOST:$INGRESS_PORT/status/418 -Hhost:httpbin.org

-=[ teapot ]=-

   _...._
 .'  _ _ `.
| ."` ^ `". _,
\_;`"---"`|//
  |       ;/
  \_     _/
    `"""`


If the Istio ingress gateway is deployed in the istio-system namespace, print the gateway’s log with the following command:
$ kubectl logs -l istio=ingressgateway -c istio-proxy -n istio-system | grep 'httpbin.org'


Search the log for an entry similar to:
[2019-01-31T14:40:18.645Z] "GET /status/418 HTTP/1.1" 418 - 0 135 187 186 "10.127.220.75" "curl/7.54.0" "28255618-6ca5-9d91-9634-c562694a3625" "httpbin.org" "34.232.181.106:80" outbound|80||httpbin.org - 172.30.230.33:80 10.127.220.75:52077 -


Access the edition.cnn.com service through your ingress gateway:
$ curl -s --resolve edition.cnn.com:$SECURE_INGRESS_PORT:$INGRESS_HOST https://edition.cnn.com:$SECURE_INGRESS_PORT | grep -o ".*"
CNN International - Breaking News, US News, World News and Video


If the Istio ingress gateway is deployed in the istio-system namespace, print the gateway’s log with the following command:
$ kubectl logs -l istio=ingressgateway -c istio-proxy -n istio-system | grep 'edition.cnn.com'


Search the log for an entry similar to:
[2019-01-31T13:40:11.076Z] "- - -" 0 - 589 17798 1644 - "-" "-" "-" "-" "172.217.31.132:443" outbound|443||edition.cnn.com 172.30.230.33:54508 172.30.230.33:443 10.127.220.75:49467 edition.cnn.com


Cleanup
Remove the gateway, the virtual services and the service entries:
$ kubectl delete gateway proxy
$ kubectl delete virtualservice cnn httpbin
$ kubectl delete serviceentry cnn httpbin-ext localhost
$ kubectl delete destinationrule localhost



Multi-Mesh Deployments for Isolation and Boundary Protection
Wed, 02 Oct 2019 00:00:00 +0000
Various compliance standards require protection of sensitive data environments. Some of the important standards and the
types of sensitive data they protect appear in the following table:

  
      
          Standard
          Sensitive data
      
  
  
      
          PCI DSS
          payment card data
      
      
          FedRAMP
          federal information, data and metadata
      
      
          HIPAA
          personal health data
      
      
          GDPR
          personal data
      
  

PCI DSS, for example, recommends putting cardholder data
environment on a network, separate from the rest of the system. It also requires using a DMZ,
and setting firewalls between the public Internet and the DMZ, and between the DMZ and the internal network.
Isolation of sensitive data environments from other information systems can reduce the scope of the compliance checks
and improve the security of the sensitive data. Reducing the scope reduces the risks of failing a compliance check and
reduces the costs of compliance since there are less components to check and secure, according to compliance
requirements.
You can achieve isolation of sensitive data by separating the parts of the application that process that data
into a separate service mesh, preferably on a separate network, and then connect the meshes with different
compliance requirements together in a multi-mesh deployment.
The process of connecting inter-mesh
applications is called mesh federation.
Note that using mesh federation to create a multi-mesh deployment is very different than creating a
multicluster deployment, which defines a single service mesh composed from services spanning more than one cluster. Unlike multi-mesh, a multicluster deployment is not suitable for
applications that require isolation and boundary protection.
In this blog post I describe the requirements for isolation and boundary protection, and outline the principles of
multi-mesh deployments. Finally, I touch on the current state of mesh-federation support and automation work under way for
Istio.
Isolation and boundary protection
Isolation and boundary protection mechanisms are explained in the
NIST Special Publication 800-53, Revision 4, Security and Privacy Controls for Federal Information Systems and Organizations,
Appendix F, Security Control Catalog, SC-7 Boundary Protection.
In particular, the Boundary protection, isolation of information system components control enhancement:

    
        
            
        
        Organizations can isolate information system components performing different missions and/or business functions.
Such isolation limits unauthorized information flows among system components and also provides the opportunity to deploy
greater levels of protection for selected components. Separating system components with boundary protection mechanisms
provides the capability for increased protection of individual components and to more effectively control information
flows between those components. This type of enhanced protection limits the potential harm from cyber attacks and
errors. The degree of separation provided varies depending upon the mechanisms chosen. Boundary protection mechanisms
include, for example, routers, gateways, and firewalls separating system components into physically separate networks or
subnetworks, cross-domain devices separating subnetworks, virtualization techniques, and encrypting information flows
among system components using distinct encryption keys.

        
    


Various compliance standards recommend isolating environments that process sensitive data from the rest of the
organization.
The Payment Card Industry (PCI) Data Security Standard
recommends implementing network isolation for cardholder data environment and requires isolating this environment from
the DMZ.
FedRAMP Authorization Boundary Guidance
describes authorization boundary for federal information and data, while
NIST Special Publication 800-37, Revision 2, Risk Management Framework for Information Systems and Organizations: A System Life Cycle Approach for Security and Privacy
recommends protecting of such a boundary in Appendix G, Authorization Boundary Considerations:

    
        
            
        
        Dividing a system into subsystems (i.e., divide and conquer) facilitates a targeted application of controls to achieve
adequate security, protection of individual privacy, and a cost-effective risk management process. Dividing complex
systems into subsystems also supports the important security concepts of domain separation and network segmentation,
which can be significant when dealing with high value assets. When systems are divided into subsystems, organizations
may choose to develop individual subsystem security and privacy plans or address the system and subsystems in the same
security and privacy plans.
Information security and privacy architectures play a key part in the process of dividing complex systems into
subsystems. This includes monitoring and controlling communications at internal boundaries among subsystems and
selecting, allocating, and implementing controls that meet or exceed the security and privacy requirements of the
constituent subsystems.

        
    


Boundary protection, in particular, means:

put an access control mechanism at the boundary (firewall, gateway, etc.)
monitor the incoming/outgoing traffic at the boundary
all the access control mechanisms must be deny-all by default
do not expose private IP addresses from the boundary
do not let components from outside the boundary to impact security inside the boundary

Multi-mesh deployments facilitate division of a system into subsystems with different
security and compliance requirements, and facilitate the boundary protection.
You put each subsystem into a separate service mesh, preferably on a separate network.
You connect the Istio meshes using gateways. The gateways monitor and control cross-mesh traffic at the boundary of
each mesh.
Features of multi-mesh deployments

non-uniform naming. The withdraw service in the accounts namespace in one mesh might have
different functionality and API than the withdraw services in the accounts namespace in other meshes.
Such situation could happen in an organization where there is no uniform policy on naming of namespaces and services, or
when the meshes belong to different organizations.
expose-nothing by default. None of the services in a mesh are exposed by default, the mesh owners must
explicitly specify which services are exposed.
boundary protection. The access control of the traffic must be enforced at the ingress gateway, which stops
forbidden traffic from entering the mesh. This requirement implements
Defense-in-depth principle and is part of some compliance
standards, such as the
Payment Card Industry (PCI) Data Security Standard.
common trust may not exist. The Istio sidecars in one mesh may not trust the Citadel certificates in other
meshes, due to some security requirement or due to the fact that the mesh owners did not initially plan to federate
the meshes.

While expose-nothing by default and boundary protection are required to facilitate compliance and improve
security, non-uniform naming and common trust may not exist are required when connecting
meshes of different organizations, or of an organization that cannot enforce uniform naming or cannot or may not
establish common trust between the meshes.
An optional feature that you may want to use is service location transparency: consuming services send requests
to the exposed services in remote meshes using local service names. The consuming services are oblivious to the fact
that some of the destinations are in remote meshes and some are local services. The access is uniform, using the local
service names, for example, in Kubernetes, reviews.default.svc.cluster.local.
Service location transparency is useful in the cases when you want to be able to change the location of the
consumed services, for example when some service is migrated from private cloud to public cloud, without changing the
code of your applications.
The current mesh-federation work
While you can perform mesh federation using standard Istio configurations already today,
it requires writing a lot of boilerplate YAML files and is error-prone. There is an effort under way to automate
the mesh federation process. In the meantime, you can look at these
multi-mesh deployment examples
to get an idea of what a generated federation might include.
Summary
In this blog post I described the requirements for isolation and boundary protection of sensitive data environments by
using Istio multi-mesh deployments. I outlined the principles of Istio
multi-mesh deployments and reported the current work on
mesh federation in Istio.
I will be happy to hear your opinion about multi-mesh and
multicluster at discuss.istio.io.



Monitoring Blocked and Passthrough External Service Traffic
Sat, 28 Sep 2019 00:00:00 +0000
Understanding, controlling and securing your external service access is one
of the key benefits that you get from a service mesh like Istio. From a security
and operations point of view, it is critical to monitor what external service traffic
is getting blocked as they might surface possible misconfigurations or a
security vulnerability if an application is attempting to communicate with a
service that it should not be allowed to. Similarly, if you currently have a
policy of allowing any external service access, it is beneficial to monitor
the traffic so you can incrementally add explicit Istio configuration to allow
access and better secure your cluster. In either case, having visibility into this
traffic via telemetry is quite helpful as it enables you to create alerts and
dashboards, and better reason about your security posture. This was a highly
requested feature by production users of Istio and we are excited that the
support for this was added in release 1.3.
To implement this, the Istio default
metrics are augmented with
explicit labels to capture blocked and passthrough external service traffic.
This blog will cover how you can use these augmented metrics to monitor all
external service traffic.
The Istio control plane configures the sidecar proxy with
predefined clusters called BlackHoleCluster and Passthrough which block or
allow all traffic respectively. To understand these clusters, let’s start with
what external and internal services mean in the context of Istio service mesh.
External and internal services
Internal services are defined as services which are part of your platform
and are considered to be in the mesh. For internal services, Istio control
plane provides all the required configuration to the sidecars by default.
For example, in Kubernetes clusters, Istio configures the sidecars for all
Kubernetes services to preserve the default Kubernetes behavior of all
services being able to communicate with other.
External services are services which are not part of your platform i.e. services
which are outside of the mesh. For external services, Istio provides two
options, first to block all external service access (enabled  by setting
global.outboundTrafficPolicy.mode to REGISTRY_ONLY) and
second to allow all access to external service (enabled  by setting
global.outboundTrafficPolicy.mode to ALLOW_ANY). The default option for this
setting (as of Istio 1.3) is to allow all external service access. This
option can be configured via mesh configuration.
This is where the BlackHole and Passthrough clusters are used.
What are BlackHole and Passthrough clusters?


BlackHoleCluster - The BlackHoleCluster is a virtual cluster created
in the Envoy configuration when global.outboundTrafficPolicy.mode is set to
REGISTRY_ONLY. In this mode, all traffic to external service is blocked unless
service entries
are explicitly added for each service. To implement this, the default virtual
outbound listener at 0.0.0.0:15001 which uses
original destination
is setup as a TCP Proxy with the BlackHoleCluster as the static cluster.
The configuration for the BlackHoleCluster looks like this:
{
  "name": "BlackHoleCluster",
  "type": "STATIC",
  "connectTimeout": "10s"
}
As you can see, this cluster is static with no endpoints so all the traffic
will be dropped. Additionally, Istio creates unique listeners for every
port/protocol combination of platform services which gets hit instead of the
virtual listener if the request is made to an external service on the same port.
In that case, the route configuration of every virtual route in Envoy is augmented to
add the BlackHoleCluster like this:
{
  "name": "block_all",
  "domains": [
    "*"
  ],
  "routes": [
    {
      "match": {
        "prefix": "/"
      },
      "directResponse": {
        "status": 502
      }
    }
  ]
}
The route is setup as direct response
with 502 response code which means if no other routes match the Envoy proxy
will directly return a 502 HTTP status code.


PassthroughCluster - The PassthroughCluster is a virtual cluster created
in the Envoy configuration when global.outboundTrafficPolicy.mode is set to
ALLOW_ANY. In this mode, all traffic to any external service external is allowed.
To implement this, the default virtual outbound listener at 0.0.0.0:15001
which uses SO_ORIGINAL_DST, is setup as a TCP Proxy with the PassthroughCluster
as the static cluster.
The configuration for the PassthroughCluster looks like this:
{
  "name": "PassthroughCluster",
  "type": "ORIGINAL_DST",
  "connectTimeout": "10s",
  "lbPolicy": "ORIGINAL_DST_LB",
  "circuitBreakers": {
    "thresholds": [
      {
        "maxConnections": 102400,
        "maxRetries": 1024
      }
    ]
  }
}
This cluster uses the original destination load balancing
policy which configures Envoy to send the traffic to the
original destination i.e. passthrough.
Similar to the BlackHoleCluster, for every port/protocol based listener the
virtual route configuration is augmented to add the PassthroughCluster as the
default route:
{
  "name": "allow_any",
  "domains": [
    "*"
  ],
  "routes": [
    {
      "match": {
        "prefix": "/"
      },
      "route": {
        "cluster": "PassthroughCluster"
      }
    }
  ]
}


Prior to Istio 1.3, there were no metrics reported or if metrics were reported
there were no explicit labels set when traffic hit these clusters, resulting in
lack of visibility in traffic flowing through the mesh.
The next section covers how to take advantage of this enhancement as the metrics
and labels emitted are conditional on whether the virtual outbound or explicit port/protocol
listener is being hit.
Using the augmented metrics
To capture all external service traffic in either of the cases (BlackHole or
Passthrough), you will need to monitor istio_requests_total and
istio_tcp_connections_closed_total metrics. Depending upon the Envoy listener
type i.e. TCP proxy or HTTP proxy that gets invoked, one of these metrics
will be incremented.
Additionally, in case of a TCP proxy listener in order to see the IP address of
the external service that is blocked or allowed via BlackHole or Passthrough
cluster, you will need to add the destination_ip label to the
istio_tcp_connections_closed_total metric. In this scenario, the host name of
the external service is not captured. This label is not added by default and can
be easily added by augmenting the Istio configuration for attribute generation
and Prometheus handler. You should be careful about cardinality explosion in
time series if you have many services with non-stable IP addresses.
PassthroughCluster metrics
This section explains the metrics and the labels emitted based on the listener
type invoked in Envoy.


HTTP proxy listener: This happens when the port of the external service is
same as one of the service ports defined in the cluster. In this scenario,
when the PassthroughCluster is hit, istio_requests_total will get increased
like this:
{
  "metric": {
    "__name__": "istio_requests_total",
    "connection_security_policy": "unknown",
    "destination_app": "unknown",
    "destination_principal": "unknown",
    "destination_service": "httpbin.org",
    "destination_service_name": "PassthroughCluster",
    "destination_service_namespace": "unknown",
    "destination_version": "unknown",
    "destination_workload": "unknown",
    "destination_workload_namespace": "unknown",
    "instance": "100.96.2.183:42422",
    "job": "istio-mesh",
    "permissive_response_code": "none",
    "permissive_response_policyid": "none",
    "reporter": "source",
    "request_protocol": "http",
    "response_code": "200",
    "response_flags": "-",
    "source_app": "sleep",
    "source_principal": "unknown",
    "source_version": "unknown",
    "source_workload": "sleep",
    "source_workload_namespace": "default"
  },
  "value": [
    1567033080.282,
    "1"
  ]
}
Note that the destination_service_name label is set to PassthroughCluster to
indicate that this cluster was hit and the destination_service is set to the
host of the external service.


TCP proxy virtual listener - If the external service port doesn’t map to any
HTTP based service ports within the cluster, this listener is invoked and
istio_tcp_connections_closed_total is the metric that will be increased:
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "istio_tcp_connections_closed_total",
          "connection_security_policy": "unknown",
          "destination_app": "unknown",
          "destination_ip": "52.22.188.80",
          "destination_principal": "unknown",
          "destination_service": "unknown",
          "destination_service_name": "PassthroughCluster",
          "destination_service_namespace": "unknown",
          "destination_version": "unknown",
          "destination_workload": "unknown",
          "destination_workload_namespace": "unknown",
          "instance": "100.96.2.183:42422",
          "job": "istio-mesh",
          "reporter": "source",
          "response_flags": "-",
          "source_app": "sleep",
          "source_principal": "unknown",
          "source_version": "unknown",
          "source_workload": "sleep",
          "source_workload_namespace": "default"
        },
        "value": [
          1567033761.879,
          "1"
        ]
      }
    ]
  }
}
In this case, destination_service_name is set to PassthroughCluster and
the destination_ip is set to the IP address of the external service.
The destination_ip label can be used to do a reverse DNS lookup and
get the host name of the external service. As this cluster is passthrough,
other TCP related metrics like istio_tcp_connections_opened_total,
istio_tcp_received_bytes_total and istio_tcp_sent_bytes_total are also
updated.


BlackHoleCluster metrics
Similar to the PassthroughCluster, this section explains the metrics and the
labels emitted based on the listener type invoked in Envoy.


HTTP proxy listener: This happens when the port of the external service is same
as one of the service ports defined in the cluster.
In this scenario, when the BlackHoleCluster is hit,
istio_requests_total will get increased like this:
{
  "metric": {
    "__name__": "istio_requests_total",
    "connection_security_policy": "unknown",
    "destination_app": "unknown",
    "destination_principal": "unknown",
    "destination_service": "httpbin.org",
    "destination_service_name": "BlackHoleCluster",
    "destination_service_namespace": "unknown",
    "destination_version": "unknown",
    "destination_workload": "unknown",
    "destination_workload_namespace": "unknown",
    "instance": "100.96.2.183:42422",
    "job": "istio-mesh",
    "permissive_response_code": "none",
    "permissive_response_policyid": "none",
    "reporter": "source",
    "request_protocol": "http",
    "response_code": "502",
    "response_flags": "-",
    "source_app": "sleep",
    "source_principal": "unknown",
    "source_version": "unknown",
    "source_workload": "sleep",
    "source_workload_namespace": "default"
  },
  "value": [
    1567034251.717,
    "1"
  ]
}
Note the destination_service_name label is set to BlackHoleCluster and the
destination_service to the host name of the external service. The response
code should always be 502 in this case.


TCP proxy virtual listener - If the external service port doesn’t map to any
HTTP based service ports within the cluster, this listener is invoked and
istio_tcp_connections_closed_total is the metric that will be increased:
{
  "metric": {
    "__name__": "istio_tcp_connections_closed_total",
    "connection_security_policy": "unknown",
    "destination_app": "unknown",
    "destination_ip": "52.22.188.80",
    "destination_principal": "unknown",
    "destination_service": "unknown",
    "destination_service_name": "BlackHoleCluster",
    "destination_service_namespace": "unknown",
    "destination_version": "unknown",
    "destination_workload": "unknown",
    "destination_workload_namespace": "unknown",
    "instance": "100.96.2.183:42422",
    "job": "istio-mesh",
    "reporter": "source",
    "response_flags": "-",
    "source_app": "sleep",
    "source_principal": "unknown",
    "source_version": "unknown",
    "source_workload": "sleep",
    "source_workload_namespace": "default"
  },
  "value": [
    1567034481.03,
    "1"
  ]
}
Note the destination_ip label represents the IP address of the external
service and the destination_service_name is set to BlackHoleCluster
to indicate that this traffic was blocked by the mesh. Is is interesting to
note that for the BlackHole cluster case, other TCP related metrics like
istio_tcp_connections_opened_total are not increased as there’s no
connection that is ever established.


Monitoring these metrics can help operators easily understand all the external
services consumed by the applications in their cluster.



Mixer Adapter for Knative
Wed, 18 Sep 2019 00:00:00 +0000
This post demonstrates how you can use Mixer to push application logic
into Istio. It describes a Mixer adapter which implements the Knative scale-from-zero logic
with simple code and similar performance to the original implementation.
Knative serving
Knative Serving builds on Kubernetes to support deploying
and serving of serverless applications. A core capability of serverless platforms is scale-to-zero
functionality which reduces resource usage and cost of inactive workloads.
A new mechanism is required to scale from zero when an idle application receives a new request.
The following diagram represents the current Knative architecture for scale-from-zero.

    
        
            
        
    
    Knative scale-from-zero

The traffic for an idle application is redirected to Activator component by programming Istio with VirtualServices
and DestinationRules. When Activator receives a new request, it:

buffers incoming requests
triggers the Autoscaler
redirects requests to the application after it has been scaled up, including retries and load-balancing (if needed)

Once the application is up and running again, Knative restores the routing from Activator to the running application.
Mixer adapter
Mixer provides a rich intermediation layer between the Istio components and infrastructure backends.
It is designed as a stand-alone component, separate from Envoy, and has a simple extensibility model
to enable Istio to interoperate with a wide breadth of backends. Mixer is inherently easier to extend
than Envoy is.
Mixer is an attribute processing engine that uses operator-supplied configuration to map request attributes from the Istio proxy into calls
to the infrastructure backends systems via a pluggable set of adapters. Adapters enable Mixer to expose a single consistent API, independent of the
infrastructure backends in use. The exact set of adapters used at runtime is determined through operator configuration and can easily
be extended to target new or custom infrastructure backends.
In order to achieve Knative scale-from-zero, we use a Mixer out-of-process adapter
to call the Autoscaler. Out-of-process adapters for Mixer allow developers to use any
programming language and to build and maintain your extension as a stand-alone program
without the need to build the Istio proxy.
The following diagram represents the Knative design using the Mixer adapter.

    
        
            
        
    
    Knative scale-from-zero

In this design, there is no need to change the routing from/to Activator for an idle application as in the original Knative setup.
When the Istio proxy represented by the ingress gateway component receives a new request for an idle application, it informs Mixer, including all the
relevant metadata information.
Mixer then calls your adapter which triggers the Knative Autoscaler using the original Knative protocol.

    
        
            
        
        By using this design you do not need to deal with buffering, retries and load-balancing because it is already handled by the Istio proxy.
    


Istio’s use of Mixer adapters makes it possible to replace otherwise complex networking-based application logic with a more
straightforward implementation, as demonstrated in the Knative adapter.
When the adapter receives a message from Mixer, it sends a StatMessage directly to Autoscaler
component using the Knative protocol.
The metadata information (namespace and service name) required by Autoscaler are transferred by Istio proxy to
Mixer and from there to the adapter.
Summary
I compared the cold-start time of the original Knative reference architecture to the new Istio Mixer adapter reference architecture.
The results show similar cold-start times.
The implementation using the Mixer adapter has greater simplicity. It is not necessary to handle low-level network-based mechanisms as these are handled by Envoy.
The next step is converting this Mixer adapter into an Envoy-specific filter running inside an ingress gateway.
This will allow to further improve the latency overhead (no more calls to Mixer and the adapter) and
to remove the dependency on the Istio Mixer.



App Identity and Access Adapter
Wed, 18 Sep 2019 00:00:00 +0000
If you are running your containerized applications on Kubernetes, you can benefit from using the App Identity and Access Adapter for an abstracted level of security with zero code changes or redeploys.
Whether your computing environment is based on a single cloud provider, a combination of multiple cloud providers, or following a hybrid cloud approach, having a centralized identity management can help you to preserve existing infrastructure and avoid vendor lock-in.
With the App Identity and Access Adapter, you can use any OAuth2/OIDC provider: IBM Cloud App ID, Auth0, Okta, Ping Identity, AWS Cognito, Azure AD B2C and more. Authentication and authorization policies can be applied in a streamlined way in all environments — including frontend and backend applications — all without code changes or redeploys.
Understanding Istio and the adapter
Istio is an open source service mesh that
transparently layers onto distributed applications and seamlessly integrates
with Kubernetes. To reduce the complexity of deployments Istio provides
behavioral insights and operational control over the service mesh as a whole.
See the Istio Architecture for more details.
Istio uses Envoy proxy sidecars to mediate inbound and outbound traffic for all pods in the service mesh. Istio extracts telemetry from the Envoy sidecars and sends it to Mixer, the Istio component responsible for collecting telemetry and enforcing policy.
The App Identity and Access adapter extends the Mixer functionality by analyzing the telemetry (attributes) against various access control policies across the service mesh. The access control policies can be linked to a particular Kubernetes services and can be finely tuned to specific service endpoints. For more information about policies and telemetry, see the Istio documentation.
When App Identity and Access Adapter is combined with Istio, it provides a scalable, integrated identity and access solution for multicloud architectures that does not require any custom application code changes.
Installation
App Identity and Access adapter can be installed using Helm directly from the github.com repository
$ helm repo add appidentityandaccessadapter https://raw.githubusercontent.com/ibm-cloud-security/app-identity-and-access-adapter/master/helm/appidentityandaccessadapter
$ helm install --name appidentityandaccessadapter appidentityandaccessadapter/appidentityandaccessadapter
Alternatively, you can clone the repository and install the Helm chart locally
$ git clone git@github.com:ibm-cloud-security/app-identity-and-access-adapter.git
$ helm install ./helm/appidentityandaccessadapter --name appidentityandaccessadapter.
Protecting web applications
Web applications are most commonly protected by the OpenID Connect (OIDC) workflow called authorization_code. When an unauthenticated/unauthorized user is detected, they are automatically redirected to the identity service of your choice and presented with the authentication page. When authentication completes, the browser is redirected back to an implicit /oidc/callback endpoint intercepted by the adapter. At this point, the adapter obtains access and identity tokens from the identity service and then redirects users back to their originally requested URL in the web app.
Authentication state and tokens are maintained by the adapter. Each request processed by the adapter will include the Authorization header bearing both access and identity tokens in the following format Authorization: Bearer  
Developers can read leverage the tokens for application experience adjustments, e.g. displaying user name, adjusting UI based on user role etc.
In order to terminate the authenticated session and wipe tokens, aka user logout, simply redirect browser to the /oidc/logout endpoint under the protected service, e.g. if you’re serving your app from https://example.com/myapp, redirect users to https://example.com/myapp/oidc/logout
Whenever access token expires, a refresh token is used to automatically acquire new access and identity tokens without your user’s needing to re-authenticate. If the configured identity provider returns a refresh token, it is persisted by the adapter and used to retrieve new access and identity tokens when the old ones expire.
Applying web application protection
Protecting web applications requires creating two types of resources - use OidcConfig resources to define various OIDC providers, and Policy resources to define the web app protection policies.
apiVersion: "security.cloud.ibm.com/v1"
kind: OidcConfig
metadata:
    name: my-oidc-provider-config
    namespace: sample-namespace
spec:
    discoveryUrl: 
    clientId: 
    clientSecretRef:
        name: 
        key: 
apiVersion: "security.cloud.ibm.com/v1"
kind: Policy
metadata:
    name: my-sample-web-policy
    namespace: sample-namespace
spec:
    targets:
    - serviceName: 
        paths:
        - prefix: /webapp
            method: ALL
            policies:
            - policyType: oidc
                config: my-oidc-provider-config
                rules: // optional
                - claim: iss
                    match: ALL
                    source: access_token
                    values:
                    - 
                - claim: scope
                    match: ALL
                    source: access_token
                    values:
                    - openid
Read more about protecting web applications
Protecting backend application and APIs
Backend applications and APIs are protected using the Bearer Token flow, where an incoming token is validated against a particular policy. The Bearer Token authorization flow expects a request to contain the Authorization header with a valid access token in JWT format. The expected header structure is Authorization: Bearer {access_token}. In case token is successfully validated request will be forwarded to the requested service. In case token validation fails the HTTP 401 will be returned back to the client with a list of scopes that are required to access the API.
Applying backend application and APIs protection
Protecting backend applications and APIs requires creating two types of resources - use JwtConfig resources to define various JWT providers, and Policy resources to define the backend app protection policies.
apiVersion: "security.cloud.ibm.com/v1"
kind: JwtConfig
metadata:
    name: my-jwt-config
    namespace: sample-namespace
spec:
    jwksUrl: 
apiVersion: "security.cloud.ibm.com/v1"
kind: Policy
metadata:
    name: my-sample-backend-policy
    namespace: sample-namespace
spec:
    targets:
    - serviceName: 
        paths:
        - prefix: /api/files
            method: ALL
            policies:
            - policyType: jwt
                config: my-oidc-provider-config
                rules: // optional
                - claim: iss
                    match: ALL
                    source: access_token
                    values:
                    - 
                - claim: scope
                    match: ALL
                    source: access_token
                    values:
                    - files.read
                    - files.write
Read more about protecting backend applications
Known limitations
At the time of writing this blog there are two known limitations of the App Identity and Access adapter:


If you use the App Identity and Access adapter for Web Applications you should not create more than a single replica of the adapter. Due to the way Envoy Proxy was handling HTTP headers it was impossible to return multiple Set-Cookie headers from Mixer back to Envoy. Therefore we couldn’t set all the cookies required for handling Web Application scenarios. The issue was recently addressed in Envoy and Mixer and we’re planning to address this in future versions of our adapter. Note that this only affects Web Applications, and doesn’t affect Backend Apps and APIs in any way.


As a general best practice you should always consider using mutual-tls for any in-cluster communications. At the moment the communications channel between Mixer and App Identity and Access adapter currently does not use mutual-tls. In future we plan to address this by implementing an approach described in the Mixer Adapter developer guide.


Summary
When a multicloud strategy is in place, security can become complicated as the environment grows and diversifies. While cloud providers supply protocols and tools to ensure their offerings are safe, the development teams are still responsible for the application-level security, such as API access control with OAuth2, defending against man-in-the-middle attacks with traffic encryption, and providing mutual TLS for service access control. However, this becomes complex in a multicloud environment since you might need to define those security details for each service separately. With proper security protocols in place, those external and internal threats can be mitigated.
Development teams have spent time making their services portable to different cloud providers, and in the same regard, the security in place should be flexible and not infrastructure-dependent.
Istio and App Identity and Access Adapter allow you to secure your Kubernetes apps with absolutely zero code changes or redeployments regardless of which programming language and which frameworks you use. Following this approach ensures maximum portability of your apps, and ability to easily enforce same security policies across multiple environments.
You can read more about the App Identity and Access Adapter in the release blog.



Change in Secret Discovery Service in Istio 1.3
Tue, 10 Sep 2019 00:00:00 +0000
In Istio 1.3, we are taking advantage of improvements in Kubernetes to issue certificates for workload instances more securely.
When a Citadel Agent sends a certificate signing request to Citadel to get a certificate for a workload instance,
it includes the JWT that the Kubernetes API server issued representing the service account of the workload instance.
If Citadel can authenticate the JWT, it extracts the service account name needed to issue the certificate for the workload instance.
Before Kubernetes 1.12, the Kubernetes API server issues JWTs with the following issues:

The tokens don’t have important fields to limit their scope of usage, such as aud or exp. See Bound Service Tokens for more info.
The tokens are mounted onto all the pods without a way to opt-out. See Service Account Token Volumes for motivation.

Kubernetes 1.12 introduces trustworthy JWTs to solve these issues.
However, support for the aud field to have a different value than the API server audience didn’t become available until Kubernetes 1.13.
To better secure the mesh, Istio 1.3 only supports trustworthy JWTs and requires the value of the aud field to be istio-ca when you enable SDS.
Before upgrading your Istio deployment to 1.3 with SDS enabled, verify that you use Kubernetes 1.13 or later.
Make the following considerations based on your platform of choice:

GKE: Upgrade your cluster version to at least 1.13.
On-prem Kubernetes and GKE on-prem: Add extra configurations to your Kubernetes. You may
also want to refer to the api-server page for the most up-to-date flag names.
For other platforms, check with your provider. If your vendor does not support trustworthy JWTs, you will need to fall back to the file-mount approach to propagate the workload keys and certificates in Istio 1.3.




The Evolution of Istio's APIs
Mon, 05 Aug 2019 00:00:00 +0000
One of Istio’s main goals has always been, and continues to be, enabling teams to develop abstractions that work best for their specific organization and workloads. Istio provides robust and powerful building blocks for service-to-service networking. Since Istio 0.1, the Istio team has been learning from production users about how they map their own architectures, workloads, and constraints to Istio’s capabilities, and we’ve been evolving Istio’s APIs to make them work better for you.
Evolving Istio’s APIs
The next step in Istio’s evolution is to sharpen our focus and align with the roles of Istio’s users. A security admin should be able to interact with an API that logically groups and simplifies security operations within an Istio mesh; the same goes for service operators and traffic management operations.
Taking it a step further, there’s an opportunity to provide improved experiences for beginning, intermediate, and advanced use cases for each role. There are many common use cases that can be addressed with obvious default settings and a better defined initial experience that requires little to no configuration. For intermediate use cases, the Istio team wants to leverage contextual cues from the environment and provide you with a simpler configuration experience. Finally, for advanced scenarios, our goal is to make easy things easy and hard things possible.
To provide these sorts of role-centric abstractions, however, the APIs underneath them must be able to describe all of Istio’s power and capabilities. Historically, Istio’s approach to API design followed paths similar to those of other infrastructure APIs. Istio follows these design principles:

The Istio APIs should seek to:

Properly represent the underlying resources to which they are mapped
Shouldn’t hide any of the underlying resource’s useful capabilities


The Istio APIs should also be composable, so end users can combine infrastructure APIs in a way that makes sense for their own needs.
The Istio APIs should be flexible: Within an organization, it should be possible to have different representations of the underlying resources and surface the ones that make sense for each individual team.

Over the course of the next several releases we will share our progress as we strengthen the alignment between Istio’s APIs and the roles of Istio users.
Composability and abstractions
Istio and Kubernetes often go together, but Istio is much more than an add-on to Kubernetes – it is as much a platform as Kubernetes is. Istio aims to provide infrastructure, and surface the capabilities you need in a powerful service mesh. For example, there are platform-as-a-service offerings that use Kubernetes as their foundation, and build on Kubernetes’ composability to provide a subset of APIs to application developers.
The number of objects that must be configured to deploy applications is a concrete example of Kubernetes’ composability. By our count, at least 10 objects need to be configured: Namespace, Service, Ingress, Deployment, HorizontalPodAutoscaler, Secret, ConfigMap, RBAC, PodDisruptionBudget, and NetworkPolicy.
It sounds complicated, but not everyone needs to interact with those concepts. Some are the responsibility of different teams like the cluster, network, or security admin teams, and many provide sensible defaults. A great benefit of cloud native platforms and deployment tools is that they can hide that complexity by taking in a small amount of information and configuring those objects for you.
Another example of composability in the networking space can be found in the Google Cloud HTTP(S) Load Balancer (GCLB). To correctly use an instance of the GCLB, six different infrastructure objects need to be created and configured. This design is the result of our 20 years of experience in operating distributed systems and there is a reason why each one is separate from the others. But the steps are simplified when you’re creating an instance via the Google Cloud console. We provide the more common end-user/role-specific configurations, and you can configure less common settings later. Ultimately, the goals of infrastructure APIs are to offer the most flexibility without sacrificing functionality.
Knative is a platform for building, running, and operating serverless workloads that provides a great real-world example of role-centric,
higher-level APIs. Knative Serving, a component of Knative that builds on Kubernetes and Istio to support deploying and
serving serverless applications and functions, provides an opinionated workflow for application developers to manage routes and revisions of their services.
Thanks to that opinionated approach, Knative Serving exposes a subset of Istio’s networking APIs that are most relevant to application developers via a simplified
Routes object that supports revisions and traffic routing,
abstracting Istio’s VirtualService and DestinationRule
resources.
As Istio has matured, we’ve also seen production users develop workload- and organization-specific abstractions on top of Istio’s infrastructure APIs.
AutoTrader UK has one of our favorite examples of a custom platform built on Istio. In an interview with the Kubernetes Podcast from Google, Russel Warman and Karl Stoney describe their Kubernetes-based delivery platform, with cost dashboards using Prometheus and Grafana. With minimal effort, they added configuration options to determine what their developers want configured on the network, and it now manages the Istio objects required to make that happen. There are countless other platforms being built in enterprise and cloud-native companies: some designed to replace a web of company-specific custom scripts, and some aimed to be a general-purpose public tool. As more companies start to talk about their tooling publicly, we’ll bring their stories to this blog.
What’s coming next
Some areas of improvement that we’re working on for upcoming releases include:

Installation profiles to set up standard patterns for ingress and egress, with the Istio operator
Automatic inference of container ports and protocols for telemetry
Support for routing all traffic by default to constrain routing incrementally
Add a single global flag to enable mutual TLS and encrypt all inter-pod traffic

Oh, and if for some reason you judge a toolbox by the list of CRDs it installs, in Istio 1.2 we cut the number from 54 down to 23. Why? It turns out that if you have a bunch of features, you need to have a way to configure them all. With the improvements we’ve made to our installer, you can now install Istio using a configuration that works with your adapters.
All service meshes and, by extension, Istio seeks to automate complex infrastructure operations, like networking and security. That means there will always be complexity in its APIs, but Istio will always aim to solve the needs of operators, while continuing to evolve the API to provide robust building blocks and prioritize flexibility through role-centric abstractions.
We can’t wait for you to join our community to see what you build with Istio next!



Secure Control of Egress Traffic in Istio, part 3
Mon, 22 Jul 2019 00:00:00 +0000
Welcome to part 3 in our series about secure control of egress traffic in Istio.
In the first part in the series, I presented the attacks involving
egress traffic and the requirements we collected for a secure control system for egress traffic.
In the second part in the series, I presented the Istio way of
securing egress traffic and showed how you can prevent the attacks using Istio.
In this installment, I compare secure control of egress traffic in Istio with alternative solutions such as using Kubernetes
network policies and legacy egress proxies and firewalls. Finally, I describe the performance considerations regarding the
secure control of egress traffic in Istio.
Alternative solutions for egress traffic control
First, let’s remember the requirements for egress traffic control we previously collected:

Support of TLS with
SNI or of TLS origination.
Monitor SNI and the source workload of every egress access.
Define and enforce policies per cluster.
Define and enforce policies per source, Kubernetes-aware.
Prevent tampering.
Traffic control is transparent to the applications.

Next, I’m going to cover two alternative solutions for egress traffic control: the Kubernetes network policies and
egress proxies and firewalls. I show the requirements they satisfy, and, more importantly, the requirements they can’t satisfy.
Kubernetes provides a native solution for traffic control, and in particular, for control of egress traffic, through the network policies.
Using these network policies, cluster operators can configure which pods can access specific external services.
Cluster operators can identify pods by pod labels, namespace labels, or by IP ranges. To specify the external services, cluster operators can use IP ranges, but cannot use domain names like cnn.com. This is because Kubernetes network policies are not DNS-aware.
Network policies satisfy the first requirement since they can control any TCP traffic.
Network policies only partially satisfy the third and the fourth requirements because cluster operators can specify policies
per cluster or per pod but operators can’t identify external services by domain names.
Network policies only satisfy the fifth requirement if the attackers are not able to break from a malicious container into the Kubernetes
node and interfere with the implementation of the policies inside said node.
Lastly, network policies do satisfy the sixth requirement: Operators don’t need to change the code or the
container environment. In summary, we can say that Kubernetes Network Policies provide transparent, Kubernetes-aware egress traffic
control, which is not DNS-aware.
The second alternative predates the Kubernetes network policies. Using a DNS-aware egress proxy or firewall lets you
configure applications to direct the traffic to the proxy and use some proxy protocol, for example,
SOCKS.
Since operators must configure the applications, this solution is not transparent. Moreover, operators can’t use
pod labels or pod service accounts to configure the proxies because the egress proxies don’t know about them. Therefore, the egress proxies are not Kubernetes-aware and can’t fulfill the fourth requirement because
egress proxies cannot enforce policies by source if a Kubernetes artifact specifies the source.
In summary, egress proxies can fulfill the first, second, third and fifth requirements, but can’t satisfy the fourth and
the six requirements because they are not transparent and not Kubernetes-aware.
Advantages of Istio egress traffic control
Istio egress traffic control is DNS-aware: you can define policies based on URLs or on wildcard domains like
*.ibm.com. In this sense, it is better than Kubernetes network policies which are not DNS-aware.
Istio egress traffic control is transparent with regard to TLS traffic, since Istio is transparent:
you don’t need to change the applications or configure their containers.
For HTTP traffic with TLS origination, you must configure the applications in the mesh to use HTTP instead of HTTPS.
Istio egress traffic control is Kubernetes-aware: the identity of the source of egress traffic is based on
Kubernetes service accounts. Istio egress traffic control is better than the legacy DNS-aware proxies or firewalls which
are not transparent and not Kubernetes-aware.
Istio egress traffic control is secure: it is based on the strong identity of Istio and, when you
apply
additional security measures,
Istio’s traffic control is resilient to tampering.
Additionally, Istio’s egress traffic control provides the following advantages:

Define access policies in the same language for ingress, egress, and in-cluster traffic. You
need to learn a single policy and configuration language for all types of traffic.
Out-of-the-Box integration of Istio’s egress traffic control with Istio’s policy and observability adapters.
Write the adapters to use external monitoring or access control systems with Istio only once and
apply them for all types of traffic: ingress, egress, and in-cluster.
Use Istio’s traffic management features for egress traffic:
load balancing, passive and active health checking, circuit breaker, timeouts, retries, fault injection, and others.

We refer to a system with the advantages above as Istio-aware.
The following table summarizes the egress traffic control features that Istio and the alternative solutions provide:

  
      
          
          Istio Egress Traffic Control
          Kubernetes Network Policies
          Legacy Egress Proxy or Firewall
      
  
  
      
          DNS-aware
          
          
          
      
      
          Kubernetes-aware
          
          
          
      
      
          Transparent
          
          
          
      
      
          Istio-aware
          
          
          
      
  

Performance considerations
Controlling egress traffic using Istio has a price: increased latency of calls to external services and
increased CPU usage by the cluster’s pods.
Traffic passes through two proxies:

The application’s sidecar proxy
The egress gateway’s proxy

If you use TLS egress traffic to wildcard domains,
you must add
an additional proxy
between the application and the external service. Since the traffic between the egress gateway’s proxy and
the proxy needed for the configuration of arbitrary domains using wildcards is on the pod’s local
network, that traffic shouldn’t have a significant impact on latency.
See a performance evaluation of different Istio configurations set to control egress
traffic. I would encourage you to carefully measure different configurations with your own applications and your own
external services, before you decide whether you can afford the performance overhead for your use cases. You should weigh the
required level of security versus your performance requirements and compare the performance overhead of all
alternative solutions.
Let me share my thoughts on the performance overhead that controlling egress traffic using Istio adds:
Accessing external services already could have high latency and the overhead added
because of two or three proxies inside the cluster could likely not be very significant by comparison.
After all, applications with a microservice architecture can have chains of dozens of calls between microservices.
Therefore, an additional hop with one or two proxies in the egress gateway should not have a large impact.
Moreover, we continue to work towards reducing Istio’s performance overhead.
Possible optimizations include:

Extending Envoy to handle wildcard domains: This would eliminate the need for a third proxy between
the application and the external services for that use case.
Using mutual TLS for authentication only without encrypting the TLS traffic, since the traffic is already
encrypted.

Summary
I hope that after reading this series you are convinced that controlling egress traffic is very important for the
security of your cluster.
Hopefully, I also managed to convince you that Istio is an effective tool to control egress traffic
securely, and that Istio has multiple advantages over the alternative solutions.
Istio is the only solution I’m aware of that lets you:

Control egress traffic in a secure and transparent way
Specify external services as domain names
Use Kubernetes artifacts to specify the traffic source

In my opinion, secure control of egress traffic is a great choice if you are looking for your first Istio use case.
In this case, Istio already provides you some benefits even before you start using all other Istio features:
traffic management, security,
policies and observability, applied to traffic between
microservices inside the cluster.
So, if you haven’t had the chance to work with Istio yet, install Istio on your cluster
and check our egress traffic control tasks and the tasks for the other
Istio features. We also want to hear from you, please join us at discuss.istio.io.



Secure Control of Egress Traffic in Istio, part 2
Wed, 10 Jul 2019 00:00:00 +0000
Welcome to part 2 in our new series about secure control of egress traffic in Istio.
In the first part in the series, I presented the attacks involving
egress traffic and the requirements we collected for a secure control system for egress traffic.
In this installment, I describe the Istio way to securely control the egress traffic, and show how Istio can help you
prevent the attacks.
Secure control of egress traffic in Istio
To implement secure control of egress traffic in Istio, you must
direct TLS traffic to external services through an egress gateway.
Alternatively, you
can direct HTTP traffic through an egress gateway
and let the egress gateway perform TLS origination.
Both alternatives have their pros and cons, you should choose between them according to your circumstances.
The choice mainly depends on whether your application can send unencrypted HTTP requests and whether your
organization’s security policies allow sending unencrypted HTTP requests.
For example, if your application uses some client library that encrypts the traffic without a possibility to cancel the
encryption, you cannot use the option of sending unencrypted HTTP traffic.
The same in the case your organization’s security policies do not allow sending unencrypted HTTP requests
inside the pod (outside the pod the traffic is encrypted by Istio).
If the application sends HTTP requests and the egress gateway performs TLS origination, you can monitor HTTP
information like HTTP methods, headers, and URL paths. You can also
define policies based on said HTTP information. If the application
performs TLS origination, you can
monitor SNI and the service account of the
source pod’s TLS traffic, and define policies based on SNI and service accounts.
You must ensure that traffic from your cluster to the outside cannot bypass the egress gateway. Istio cannot enforce it
for you, so you must apply some
additional security mechanisms,
for example,
the Kubernetes network policies or an L3
firewall. See an example of the
Kubernetes network policies configuration.
According to the Defense in depth concept, the more
security mechanisms you apply for the same goal, the better.
You must also ensure that Istio control plane and the egress gateway cannot be compromised. While you may have hundreds
or thousands of application pods in your cluster, there are only a dozen of Istio control plane pods and the gateways.
You can and should focus on protecting the control plane pods and the gateways, since it is easy (there is a small
number of pods to protect) and it is most crucial for the security of your cluster.
If attackers compromise the control plane or the egress gateway, they could violate any policy.
You might have multiple tools to protect the control plane pods, depending on your environment.
The reasonable security measures are:

Run the control plane pods on nodes separate from the application nodes.
Run the control plane pods in their own separate namespace.
Apply the Kubernetes RBAC and network policies to protect the control plane pods.
Monitor the control plane pods more closely than you do the application pods.

Once you direct egress traffic through an egress gateway and apply the additional security mechanisms,
you can securely monitor and enforce security policies for the traffic.
The following diagram shows Istio’s security architecture, augmented with an L3 firewall which is part of the
additional security mechanisms
that should be provided outside of Istio.

    
        
            
        
    
    Istio Security Architecture with Egress Gateway and L3 Firewall

You can configure the L3 firewall trivially to only allow incoming traffic through the Istio ingress gateway and
only allow outgoing traffic through the Istio egress gateway. The Istio proxies of the gateways enforce
policies and report telemetry just as all other proxies in the mesh do.
Now let’s examine possible attacks and let me show you how the secure control of egress traffic in Istio prevents them.
Preventing possible attacks
Consider the following security policies for egress traffic:

Application A is allowed to access *.ibm.com, which includes all the external services with URLs matching
*.ibm.com.
Application B is allowed to access mongo1.composedb.com.
All egress traffic is monitored.

Suppose the attackers have the following goals:

Access *.ibm.com from your cluster.
Access *.ibm.com from your cluster, unmonitored. The attackers want their traffic to be unmonitored to prevent a
possibility that you will detect the forbidden access.
Access mongo1.composedb.com from your cluster.

Now suppose that the attackers manage to break into one of the pods of application A, and try to use the compromised
pod to perform the forbidden access. The attackers may try their luck and access the external services in a
straightforward way. You will react to the straightforward attempts as follows:

Initially, there is no way to prevent a compromised application A to access *.ibm.com, because the compromised
pod is indistinguishable from the original pod.
Fortunately, you can monitor all access to external services, detect suspicious traffic, and thwart attackers from
gaining unmonitored access to *.ibm.com. For example, you could apply anomaly detection tools on the
egress traffic logs.
To stop attackers from accessing mongo1.composedb.com from your cluster, Istio will correctly detect the source of
the traffic, application A in this case, and verify that it is not allowed to access mongo1.composedb.com
according to the security policies mentioned above.

Having failed to achieve their goals in a straightforward way, the malicious actors may resort to advanced attacks:

Bypass the container’s sidecar proxy to be able to access any external service directly, without the sidecar’s
policy enforcement and reporting. This attack is prevented by a Kubernetes Network Policy or by an L3 firewall that
allow egress traffic to exit the mesh only from the egress gateway.
Compromise the egress gateway to be able to force it to send fake information to the monitoring system or to
disable enforcement of the security policies. This attack is prevented by applying the special security measures to
the egress gateway pods.
Impersonate as application B since application B is allowed to access mongo1.composedb.com. This attack,
fortunately, is prevented by Istio’s strong identity support.

As far as we can see, all the forbidden access is prevented, or at least is monitored and can be prevented later.
If you see other attacks that involve egress traffic or security holes in the current design, we would be happy
to hear about it.
Summary
Hopefully, I managed to convince you that Istio is an effective tool to prevent attacks involving egress
traffic. In the next part of this series, I compare secure control of egress traffic in Istio with alternative
solutions such as
Kubernetes Network Policies and legacy
egress proxies/firewalls.



Best Practices: Benchmarking Service Mesh Performance
Tue, 09 Jul 2019 00:00:00 +0000
Service meshes add a lot of functionality to application deployments, including traffic policies, observability, and secure communication. But adding a service mesh to your environment comes at a cost, whether that’s time (added latency) or resources (CPU cycles). To make an informed decision on whether a service mesh is right for your use case, it’s important to evaluate how your application performs when deployed with a service mesh.
Earlier this year, we published a blog post on Istio’s performance improvements in version 1.1. Following the release of Istio 1.2, we want to provide guidance and tools to help you benchmark Istio’s data plane performance in a production-ready Kubernetes environment.
Overall, we found that Istio’s sidecar proxy latency scales with the number of concurrent connections. At 1000 requests per second (RPS), across 16 connections, Istio adds 3 milliseconds per request in the 50th percentile, and 10 milliseconds in the 99th percentile.
In the Istio Tools repository, you’ll find scripts and instructions for measuring Istio’s data plane performance, with additional instructions on how to run the scripts with Linkerd, another service mesh implementation. Follow along as we detail some best practices for each step of the performance test framework.
1. Use a production-ready Istio installation
To accurately measure the performance of a service mesh at scale, it’s important to use an adequately-sized Kubernetes cluster. We test using three worker nodes, each with at least 4 vCPUs and 15 GB of memory.
Then, it’s important to use a production-ready Istio installation profile on that cluster. This lets us achieve performance-oriented settings such as control plane pod autoscaling, and ensures that resource limits are appropriate for heavy traffic load. The default Istio installation is suitable for most benchmarking use cases. For extensive performance benchmarking, with thousands of proxy-injected services, we also provide a tuned Istio install that allocates extra memory and CPU to the Istio control plane.
 Istio’s demo installation is not suitable for performance testing, because it is designed to be deployed on a small trial cluster, and has full tracing and access logs enabled to showcase Istio’s features.
2. Focus on the data plane
Our benchmarking scripts focus on evaluating the Istio data plane: the Envoy proxies that mediate traffic between application containers. Why focus on the data plane? Because at scale, with lots of application containers, the data plane’s memory and CPU usage quickly eclipses that of the Istio control plane. Let’s look at an example of how this happens:
Say you run 2,000 Envoy-injected pods, each handling 1,000 requests per second. Each proxy is using 50 MB of memory, and to configure all these proxies, Pilot is using 1 vCPU and 1.5 GB of memory. All together, the Istio data plane (the sum of all the Envoy proxies) is using 100 GB of memory, compared to Pilot’s 1.5 GB.
It is also important to focus on data plane performance for latency reasons. This is because most application requests move through the Istio data plane, not the control plane. There are two exceptions:

Telemetry reporting: Each proxy sends raw telemetry data to Mixer, which Mixer processes into metrics, traces, and other telemetry. The raw telemetry data is similar to access logs, and therefore comes at a cost. Access log processing consumes CPU and keeps a worker thread from picking up the next unit of work. At higher throughput, it is more likely that the next unit of work is waiting in the queue to be picked up by the worker. This can lead to long-tail (99th percentile) latency for Envoy.
Custom policy checks: When using custom Istio policy adapters, policy checks are on the request path. This means that request headers and metadata on the data path will be sent to the control plane (Mixer), resulting in higher request latency. Note: These policy checks are disabled by default, as the most common policy use case (RBAC) is performed entirely by the Envoy proxies.

Both of these exceptions will go away in a future Istio release, when Mixer V2 moves all policy and telemetry features directly into the proxies.
Next, when testing Istio’s data plane performance at scale, it’s important to test not only at increasing requests per second, but also against an increasing number of concurrent connections. This is because real-world, high-throughput traffic comes from multiple clients. The provided scripts allow you to perform the same load test with any number of concurrent connections, at increasing RPS.
Lastly, our test environment measures requests between two pods, not many. The client pod is Fortio, which sends traffic to the server pod.
Why test with only two pods? Because scaling up throughput (RPS) and connections (threads) has a greater effect on Envoy’s performance than increasing the total size of the service registry — or, the total number of pods and services in the Kubernetes cluster. When the size of the service registry grows, Envoy does have to keep track of more endpoints, and lookup time per request does increase, but by a tiny constant. If you have many services, and this constant becomes a latency concern, Istio provides a Sidecar resource, which allows you to limit which services each Envoy knows about.
3. Measure with and without proxies
While many Istio features, such as mutual TLS authentication, rely on an Envoy proxy next to an application pod, you can selectively disable sidecar proxy injection for some of your mesh services. As you scale up Istio for production, you may want to incrementally add the sidecar proxy to your workloads.
To that end, the test scripts provide three different modes. These modes analyze Istio’s performance when a request goes through both the client and server proxies (both), just the server proxy (serveronly), and neither proxy (baseline).
You can also disable Mixer to stop Istio’s telemetry during the performance tests, which provides results in line with the performance we expect when the Mixer V2 work is completed. Istio also supports Envoy native telemetry, which performs similarly to having Istio’s telemetry disabled.
Istio 1.2 Performance
Let’s see how to use this test environment to analyze the data plane performance of Istio 1.2. We also provide instructions to run the same performance tests for the Linkerd data plane. Currently, only latency benchmarking is supported for Linkerd.
For measuring Istio’s sidecar proxy latency, we look at the 50th, 90th, and 99th percentiles for an increasing number of concurrent connections,keeping request throughput (RPS) constant.
We found that with 16 concurrent connections and 1000 RPS, Istio adds 3ms over the baseline (P50) when a request travels through both a client and server proxy. (Subtract the pink line, base, from the green line, both.) At 64 concurrent connections, Istio adds 12ms over the baseline, but with Mixer disabled (nomixer_both), Istio only adds 7ms.

    
        
            
        
    
    

In the 90th percentile, with 16 concurrent connections, Istio adds 6ms; with 64 connections, Istio adds 20ms.

    
        
            
        
    
    

Finally, in the 99th percentile, with 16 connections, Istio adds 10ms over the baseline. At 64 connections, Istio adds 25ms with Mixer, or 10ms without Mixer.

    
        
            
        
    
    

For CPU usage, we measured with an increasing request throughput (RPS), and a constant number of concurrent connections. We found that Envoy’s maximum CPU usage at 3000 RPS, with Mixer enabled, was 1.2 vCPUs. At 1000 RPS, one Envoy uses approximately half of a CPU.

    
        
            
        
    
    

Summary
In the process of benchmarking Istio’s performance, we learned several key lessons:

Use an environment that mimics production.
Focus on data plane traffic.
Measure against a baseline.
Increase concurrent connections as well as total throughput.

For a mesh with 1000 RPS across 16 connections, Istio 1.2 adds just 3 milliseconds of latency over the baseline, in the 50th percentile.

    
        
            
        
        Istio’s performance depends on your specific setup and traffic load. Because of this variance, make sure your test setup accurately reflects your production workloads. To try out the benchmarking scripts, head over to the Istio Tools repository.
    


Also check out the Istio Performance and Scalability guide for the most up-to-date performance data.
Thank you for reading, and happy benchmarking!



Extending Istio Self-Signed Root Certificate Lifetime
Fri, 07 Jun 2019 00:00:00 +0000
Istio self-signed certificates have historically had a 1 year default lifetime.
If you are using Istio self-signed certificates,
you need to schedule regular root transitions before they expire.
An expiration of a root certificate may lead to an unexpected cluster-wide outage.
The issue affects new clusters created with versions up to 1.0.7 and 1.1.7.
See Extending Self-Signed Certificate Lifetime for
information on how to gauge the age of your certificates and how to perform rotation.

    
        
            
        
        We strongly recommend you rotate root keys and root certificates annually as a security best practice.
We will send out instructions for root key/cert rotation soon.
    





Secure Control of Egress Traffic in Istio, part 1
Wed, 22 May 2019 00:00:00 +0000
This is part 1 in a new series about secure control of egress traffic in Istio that I am going to publish.
In this installment, I explain why you should apply egress traffic control to your cluster, the attacks
involving egress traffic you want to prevent, and the requirements for a system for egress traffic control
to do so.
Once you agree that you should control the egress traffic coming from your cluster, the following questions arise:
What is required from a system for secure control of egress traffic? Which is the best solution to fulfill
these requirements? (spoiler: Istio in my opinion)
Future installments will describe
the implementation of the secure control of egress traffic in Istio
and compare it with other solutions.
The most important security aspect for a service mesh is probably ingress traffic. You definitely must prevent attackers
from penetrating the cluster through ingress APIs. Having said that, securing
the traffic leaving the mesh is also very important. Once your cluster is compromised, and you must be
prepared for that scenario, you want to reduce the damage as much as possible and prevent the attackers from using the
cluster for further attacks on external services and legacy systems outside of the cluster. To achieve that goal,
you need secure control of egress traffic.
Compliance requirements are another reason to implement secure control of egress traffic. For example, the Payment Card
Industry (PCI) Data Security Standard requires that inbound
and outbound traffic must be restricted to that which is necessary:

    
        
            
        
        1.2.1 Restrict inbound and outbound traffic to that which is necessary for the cardholder data environment, and specifically deny all other traffic.

        
    


And specifically regarding outbound traffic:

    
        
            
        
        1.3.4 Do not allow unauthorized outbound traffic from the cardholder data environment to the Internet… All traffic outbound from the cardholder data environment should be evaluated to ensure that it follows established, authorized rules. Connections should be inspected to restrict traffic to only authorized communications (for example by restricting source/destination addresses/ports, and/or blocking of content).

        
    


Let’s start with the attacks that involve egress traffic.
The attacks
An IT organization must assume it will be attacked if it hasn’t been attacked already, and that
part of its infrastructure could already be compromised or become compromised in the future.
Once attackers are able to penetrate an application in a cluster, they can proceed to attack external services:
legacy systems, external web services and databases. The attackers may want to steal the data of the application and to
transfer it to their external servers. Attackers’ malware may require access to attackers’ servers to download
updates. The attackers may use pods in the cluster to perform DDOS attacks or to break into external systems.
Even though you cannot know all the possible types of
attacks, you want to reduce possibilities for any attacks, both for known and unknown ones.
The external attackers gain access to the application’s container from outside the mesh through a
bug in the application but attackers can also be internal, for example, malicious DevOps people inside the
organization.
To prevent the attacks described above, some form of egress traffic control must be applied. Let me present egress
traffic control in the following section.
The solution: secure control of egress traffic
Secure control of egress traffic means monitoring the egress traffic and enforcing all the security policies regarding
the egress traffic.
Monitoring the egress traffic, enables you to analyze it, possibly offline, and detect the attacks even if
you were unable to prevent them in real time.
Another good practice to reduce possibilities of attacks is to specify policies that limit access following the
Need to know principle: only the applications that
need external services should be allowed to access the external services they need.
Let me now turn to the requirements for egress traffic control we collected.
Requirements for egress traffic control
My colleagues at IBM and I collected requirements for secure control of egress traffic from several customers, and
combined them with the
egress traffic control requirements from Kubernetes Network Special Interest Group.
Istio 1.1 satisfies all gathered requirements:


Support for TLS with
SNI or for TLS origination by Istio.


Monitor SNI and the source workload of every egress access.


Define and enforce policies per cluster, e.g.:


all applications in the cluster may access service1.foo.com (a specific host)


all applications in the cluster may access any host of the form *.bar.com (a wildcarded domain)


All unspecified access must be blocked.


Define and enforce policies per source, Kubernetes-aware:


application A may access *.foo.com.


application B may access *.bar.com.


All other access must be blocked, in particular access of application A to service1.bar.com.


Prevent tampering. In case an application pod is compromised, prevent the compromised pod from escaping
monitoring, from sending fake information to the monitoring system, and from breaking the egress policies.


Nice to have: traffic control is transparent to the applications.


Let me explain each requirement in more detail. The first requirement states that only TLS traffic to the external
services must be supported.
The requirement emerged upon observation that all the traffic that leaves the cluster must be encrypted.
This means that either the applications perform TLS origination or Istio must perform TLS origination
for them.
Note that in the case an application performs TLS origination, the Istio proxies cannot see the original traffic,
only the encrypted one, so the proxies see the TLS protocol only. For the proxies it does not matter if the original
protocol is HTTP or MongoDB, all the Istio proxies can see is TLS traffic.
The second requirement states that SNI and the source of the traffic must be monitored. Monitoring is the first step to
prevent attacks. Even if attackers would be able to access external services from the cluster, if the access is
monitored, there is a chance to discover the suspicious traffic and take a corrective action.
Note that in the case of TLS originated by an application, the Istio sidecar proxies can only see TCP traffic and a
TLS handshake that includes SNI.
A label of the source pod could identify the source of the traffic but a service account of the pod or some
other source identifier could be used. We call this property of an egress control system as being Kubernetes-aware:
the system must understand Kubernetes artifacts like pods and service accounts. If the system is not Kubernetes-aware,
it can only monitor the IP address as the identifier of the source.
The third requirement states that Istio operators must be able to define policies for egress traffic for the entire
cluster.
The policies state which external services may be accessed by any pod in the cluster. The external services can be
identified either by a Fully qualified domain name of the
service, e.g. www.ibm.com or by a wildcarded domain, e.g. *.ibm.com. Only the specified external services may be
accessed, all other egress traffic is blocked.
This requirement originates from the need to prevent
attackers from accessing malicious sites, for example for downloading updates/instructions for their malware. You also
want to limit the number of external sites that the attackers can access and attack.
You want to allow access only to the external services that the applications in the cluster need to
access and to block access to all the other services, this way you reduce the
attack surface. While the external services
can have their own security mechanisms, you want to exercise Defense in depth and to have multiple security layers: a security layer in your cluster in addition to
the security layers in the external systems.
This requirement means that the external services must be identifiable by domain names. We call this property
of an egress control system as being DNS-aware.
If the system is not DNS-aware, the external services must be specified by IP addresses.
Using IP addresses is not convenient and often is not feasible, since the IP addresses of a service can change. Sometimes
all the IP addresses of a service are not even known, for example in the case of
CDNs.
The fourth requirement states that the source of the egress traffic must be added to the policies effectively extending
the third requirement.
Policies can specify which source can access which external service and the source must be identified just as in the
second requirement, for example, by a label of the source pod or by service account of the pod.
It means that policy enforcement must also be Kubernetes-aware.
If policy enforcement is not Kubernetes-aware, the policies must identify the source of traffic by
the IP of the pod, which is not convenient, especially since the pods can come and go so their IPs are not static.
The fifth requirement states that even if the cluster is compromised and the attackers control some of the pods, they
must not be able to cheat the monitoring or to violate policies of the egress control system. We say that such a
system provides secure control of egress traffic.
The sixth requirement states that the traffic control should be provided without changing the application containers, in
particular without changing the code of the applications and without changing the environment of the containers.
We call such a control of egress traffic transparent.
In the next posts I will show that Istio can function as an example of an egress traffic control system that satisfies
all of these requirements, in particular it is transparent, DNS-aware, and Kubernetes-aware.
Summary
I hope that you are convinced that controlling egress traffic is important for the security of your cluster. In the
part 2 of this series I describe the Istio way to perform secure
control of egress traffic. In
the
part 3 of this series I compare it with alternative solutions such as
Kubernetes Network Policies and legacy
egress proxies/firewalls.



Architecting Istio 1.1 for Performance
Tue, 19 Mar 2019 00:00:00 +0000
Hyper-scale, microservice-based cloud environments have been exciting to build but challenging to manage. Along came Kubernetes (container orchestration) in 2014, followed by Istio (container service management) in 2017. Both open-source projects enable developers to scale container-based applications without spending too much time on administration tasks.
Now, new enhancements in Istio 1.1 deliver scale-up with improved application performance and service management efficiency.
Simulations using our sample commercial airline reservation application show the following improvements, compared to Istio 1.0.
We’ve seen substantial application performance gains:

up to 30% reduction in application average latency
up to 40% faster service startup times in a large mesh

As well as impressive improvements in service management efficiency:

up to 90% reduction in Pilot CPU usage in a large mesh
up to 50% reduction in Pilot memory usage in a large mesh

With Istio 1.1, organizations can be more confident in their ability to scale applications with consistency and control – even in hyper-scale cloud environments.
Congratulations to the Istio experts around the world who contributed to this release. We could not be more pleased with these results.
Istio 1.1 performance enhancements
As members of the Istio Performance and Scalability workgroup, we have done extensive performance evaluations. We introduced many performance design features for Istio 1.1, in collaboration with other Istio contributors.
Some of the most visible performance enhancements in 1.1 include:

Significant reduction in default collection of Envoy-generated statistics
Added load-shedding functionality to Mixer workloads
Improved the protocol between Envoy and Mixer
Namespace isolation, to reduce operational overhead
Configurable concurrent worker threads, which can improve overall throughput
Configurable filters that limit telemetry data
Removal of synchronization bottlenecks

Continuous code quality and performance verification
Regression Patrol drives continuous improvement in Istio performance and quality. Behind the scenes, the Regression Patrol helps Istio developers to identify and fix code issues. Daily builds are checked using a customer-centric benchmark, BluePerf. The results are published to the Istio community web portal. Various application configurations are evaluated to help provide insights on Istio component performance.
Another tool that is used to evaluate the performance of Istio’s builds is Fortio, which provides a synthetic end to end load testing benchmark.
Summary
Istio 1.1 was designed for performance and scalability. The Istio Performance and Scalability workgroup measured significant performance improvements over 1.0.
Istio 1.1 introduces new features and optimizations to help harden the service mesh for enterprise microservice workloads. The Istio 1.1 Performance and Tuning Guide documents performance simulations, provides sizing and capacity planning guidance, and includes best practices for tuning custom use cases.
Useful links

Istio Service Mesh Performance (34:30), by Surya Duggirala, Laurent Demailly and Fawad Khaliq at KubeCon Europe 2018
Istio Performance and Scalability discussion forum

Disclaimer
The performance data contained herein was obtained in a controlled, isolated environment.  Actual results that may be obtained in other operating environments may vary significantly.  There is no guarantee that the same or similar results will be obtained elsewhere.



Version Routing in a Multicluster Service Mesh
Thu, 07 Feb 2019 00:00:00 +0000
If you’ve spent any time looking at Istio, you’ve probably noticed that it includes a lot of features that
can be demonstrated with simple tasks and examples
running on a single Kubernetes cluster.
Because most, if not all, real-world cloud and microservices-based applications are not that simple
and will need to have the services distributed and running in more than one location, you may be
wondering if all these things will be just as simple in your real production environment.
Fortunately, Istio provides several ways to configure a service mesh so that applications
can, more-or-less transparently, be part of a mesh where the services are running
in more than one cluster, i.e., in a
multicluster deployment.
The simplest way to set up a multicluster mesh, because it has no special networking requirements,
is using a replicated
control plane model.
In this configuration, each Kubernetes cluster contributing to the mesh has its own control plane,
but each control plane is synchronized and running under a single administrative control.
In this article we’ll look at how one of the features of Istio,
traffic management, works in a multicluster mesh with
a dedicated control plane topology.
We’ll show how to configure Istio route rules to call remote services in a multicluster service mesh
by deploying the Bookinfo sample with version v1 of the reviews service
running in one cluster, versions v2 and v3 running in a second cluster.
Set up clusters
To start, you’ll need two Kubernetes clusters, both running a slightly customized configuration of Istio.


Set up a multicluster environment with two Istio clusters by following the
replicated control planes instructions.


The kubectl command is used to access both clusters with the --context flag.
Use the following command to list your contexts:
$ kubectl config get-contexts
CURRENT   NAME       CLUSTER    AUTHINFO       NAMESPACE
*         cluster1   cluster1   user@foo.com   default
          cluster2   cluster2   user@foo.com   default


Export the following environment variables with the context names of your configuration:
$ export CTX_CLUSTER1=
$ export CTX_CLUSTER2=


Deploy version v1 of the bookinfo application in cluster1
Run the productpage and details services and version v1 of the reviews service in cluster1:
$ kubectl label --context=$CTX_CLUSTER1 namespace default istio-injection=enabled
$ kubectl apply --context=$CTX_CLUSTER1 -f - <

Deploy bookinfo v2 and v3 services in cluster2
Run the ratings service and version v2 and v3 of the reviews service in cluster2:
$ kubectl label --context=$CTX_CLUSTER2 namespace default istio-injection=enabled
$ kubectl apply --context=$CTX_CLUSTER2 -f - <

Access the bookinfo application
Just like any application, we’ll use an Istio gateway to access the bookinfo application.


Create the bookinfo gateway in cluster1:
Zip$ kubectl apply --context=$CTX_CLUSTER1 -f @samples/bookinfo/networking/bookinfo-gateway.yaml@


Follow the Bookinfo sample instructions
to determine the ingress IP and port and then point your browser to http://$GATEWAY_URL/productpage.


You should see the productpage with reviews, but without ratings, because only v1 of the reviews service
is running on cluster1 and we have not yet configured access to cluster2.
Create a service entry and destination rule on cluster1 for the remote reviews service
As described in the setup instructions,
remote services are accessed with a .global DNS name. In our case, it’s reviews.default.global,
so we need to create a service entry and destination rule for that host.
The service entry will use the cluster2 gateway as the endpoint address to access the service.
You can use the gateway’s DNS name, if it has one, or its public IP, like this:
$ export CLUSTER2_GW_ADDR=$(kubectl get --context=$CTX_CLUSTER2 svc --selector=app=istio-ingressgateway \
    -n istio-system -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}")
Now create the service entry and destination rule using the following command:
$ kubectl apply --context=$CTX_CLUSTER1 -f - <

The address 240.0.0.3 of the service entry can be any arbitrary unallocated IP.
Using an IP from the class E addresses range 240.0.0.0/4 is a good choice.
Check out the
gateway-connected multicluster example
for more details.
Note that the labels of the subsets in the destination rule map to the service entry
endpoint label (cluster: cluster2) corresponding to the cluster2 gateway.
Once the request reaches the destination cluster, a local destination rule will be used
to identify the actual pod labels (version: v1 or version: v2) corresponding to the
requested subset.
Create a destination rule on both clusters for the local reviews service
Technically, we only need to define the subsets of the local service that are being used
in each cluster (i.e., v1 in cluster1, v2 and v3 in cluster2), but for simplicity we’ll
just define all three subsets in both clusters, since there’s nothing wrong with defining subsets
for versions that are not actually deployed.
$ kubectl apply --context=$CTX_CLUSTER1 -f - <

$ kubectl apply --context=$CTX_CLUSTER2 -f - <

Create a virtual service to route reviews service traffic
At this point, all calls to the reviews service will go to the local reviews pods (v1) because
if you look at the source code you will see that the productpage implementation is simply making
requests to http://reviews:9080 (which expands to host reviews.default.svc.cluster.local), the
local version of the service.
The corresponding remote service is named reviews.default.global, so route rules are needed to
redirect requests to the global host.

    
        
            
        
        Note that if all of the versions of the reviews service were remote, so there is no local reviews
service defined, the DNS would resolve reviews directly to reviews.default.global. In that case
we could call the remote reviews service without any route rules.
    


Apply the following virtual service to direct traffic for user jason to reviews versions v2 and v3 (50/50)
which are running on cluster2. Traffic for any other user will go to reviews version v1.
$ kubectl apply --context=$CTX_CLUSTER1 -f - <


    
        
            
        
        This 50/50 rule isn’t a particularly realistic example. It’s just a convenient way to demonstrate
accessing multiple subsets of a remote service.
    


Return to your browser and login as user jason. If you refresh the page several times, you should see
the display alternating between black and red ratings stars (v2 and v3). If you logout, you will
only see reviews without ratings (v1).
Summary
In this article, we’ve seen how to use Istio route rules to distribute the versions of a service
across clusters in a multicluster service mesh with a replicated control plane model.
In this example, we manually configured the .global service entry and destination rules needed to provide
connectivity to one remote service, reviews. In general, however, if we wanted to enable any service
to run either locally or remotely, we would need to create .global resources for every service.
Fortunately, this process could be automated and likely will be in a future Istio release.



Sail the Blog!
Tue, 05 Feb 2019 00:00:00 +0000
Welcome to the Istio blog!
To make it easier to publish your content on our website, we
updated the content types guide.
The goal of the updated guide is to make sharing and finding content easier.
We want to make sharing timely information on Istio easy and the Istio blog
is a good place to start.
We welcome your posts to the blog if you think your content falls in one of the
following four categories:

Your post details your experience using and configuring Istio. Ideally, your
post shares a novel experience or perspective.
Your post highlights Istio features.
Your post details how to accomplish a task or fulfill a specific use case
using Istio.

Posting your blog is only one PR away
and, if you wish, you can request a review.
We look forward to reading about your Istio experience on the blog soon!



Egress Gateway Performance Investigation
Thu, 31 Jan 2019 00:00:00 +0000
The main objective of this investigation was to determine the impact on performance and resource utilization when an egress gateway is added in the service mesh to access an external service (MongoDB, in this case). The steps to configure an egress gateway for an external MongoDB are described in the blog Consuming External MongoDB Services.
The application used for this investigation was the Java version of Acmeair, which simulates an airline reservation system. This application is used in the Performance Regression Patrol of Istio daily builds, but on that setup the microservices have been accessing the external MongoDB directly via their sidecars, without an egress gateway.
The diagram below illustrates how regression patrol currently runs with Acmeair and Istio:

    
        
            
        
    
    Acmeair benchmark in the Istio performance regression patrol environment

Another difference is that the application communicates with the external DB with plain MongoDB protocol. The first change made for this study was to establish a TLS communication between the MongoDB and its clients running within the application, as this is a more realistic scenario.
Several cases for accessing the external database from the mesh were tested and described next.
Egress traffic cases
Case 1:  Bypassing the sidecar
In this case, the sidecar does not intercept the communication between the application and the external DB. This is accomplished by setting the init container argument -x with the CIDR of the MongoDB, which makes the sidecar ignore messages to/from this IP address. For example:
    - -x
    - "169.47.232.211/32"


    
        
            
        
    
    Traffic to external MongoDB by-passing the sidecar

Case 2: Through the sidecar, with service entry
This is the default configuration when the sidecar is injected into the application pod. All messages are intercepted by the sidecar and routed to the destination according to the configured rules, including the communication with external services. The MongoDB was defined as a ServiceEntry.

    
        
            
        
    
    Sidecar intercepting traffic to external MongoDB

Case 3: Egress gateway
The egress gateway and corresponding destination rule and virtual service resources are defined for accessing MongoDB. All traffic to and from the external DB goes through the egress gateway (envoy).

    
        
            
        
    
    Introduction of the egress gateway to access MongoDB

Case 4: Mutual TLS between sidecars and the egress gateway
In this case, there is an extra layer of security between the sidecars and the gateway, so some impact in performance is expected.

    
        
            
        
    
    Enabling mutual TLS between sidecars and the egress gateway

Case 5: Egress gateway with SNI proxy
This scenario is used to evaluate the case where another proxy is required to access wildcarded domains. This may be required due current limitations of envoy. An nginx proxy was created as sidecar in the egress gateway pod.

    
        
            
        
    
    Egress gateway with additional SNI Proxy

Environment

Istio version: 1.0.2
K8s version: 1.10.5_1517
Acmeair App: 4 services (1 replica of each), inter-services transactions, external Mongo DB, avg payload: 620 bytes.

Results
Jmeter was used to generate the workload which consisted in a sequence of 5-minute runs, each one using a growing number of clients making http requests. The number of clients used were 1, 5, 10, 20, 30, 40, 50 and 60.
Throughput
The chart below shows the throughput obtained for the different cases:

    
        
            
        
    
    Throughput obtained for the different cases

As you can see, there is no major impact in having sidecars and the egress gateway between the application and the external MongoDB, but enabling mutual TLS and then adding the SNI proxy caused a degradation in the throughput of about 10% and 24%, respectively.
Response time
The average response times for the different requests were collected when traffic was being driven with 20 clients. The chart below shows the average, median, 90%, 95% and 99% average values for each case:

    
        
            
        
    
    Response times obtained for the different configurations

Likewise, not much difference in the response times for the 3 first cases, but mutual TLS and the extra proxy adds noticeable latency.
CPU utilization
The CPU usage was collected for all Istio components as well as for the sidecars during the runs. For a fair comparison, CPU used by Istio was normalized by the throughput obtained for a given run. The results are shown in the following graph:

    
        
            
        
    
    CPU usage normalized by TPS

In terms of CPU consumption per transaction, Istio has used significantly more CPU only in the egress gateway + SNI proxy case.
Conclusion
In this investigation, we tried different options to access an external TLS-enabled MongoDB to compare their performance. The introduction of the Egress Gateway did not have a significant impact on the performance nor meaningful additional CPU consumption. Only when enabling mutual TLS between sidecars and egress gateway or using an additional SNI proxy for wildcarded domains we could observe some degradation.



Demystifying Istio's Sidecar Injection Model
Thu, 31 Jan 2019 00:00:00 +0000
A simple overview of an Istio service-mesh architecture always starts with describing the control-plane and data-plane.
From Istio’s documentation:

    
        
            
        
        An Istio service mesh is logically split into a data plane and a control plane.
The data plane is composed of a set of intelligent proxies (Envoy) deployed as sidecars. These proxies mediate and control all network communication between microservices along with Mixer, a general-purpose policy and telemetry hub.
The control plane manages and configures the proxies to route traffic. Additionally, the control plane configures Mixers to enforce policies and collect telemetry.


        
    



    
        
            
        
    
    Istio Architecture

It is important to understand that the sidecar injection into the application pods happens automatically, though manual injection is also possible. Traffic is directed from the application services to and from these sidecars without developers needing to worry about it. Once the applications are connected to the Istio service mesh, developers can start using and reaping the benefits of all that the service mesh has to offer. However, how does the data plane plumbing happen and what is really required to make it work seamlessly? In this post, we will deep-dive into the specifics of the sidecar injection models to gain a very clear understanding of how sidecar injection works.
Sidecar injection
In simple terms, sidecar injection is adding the configuration of additional containers to the pod template. The added containers needed for the Istio service mesh are:
istio-init
This init container is used to setup the iptables rules so that inbound/outbound traffic will go through the sidecar proxy. An init container is different than an app container in following ways:

It runs before an app container is started and it always runs to completion.
If there are many init containers, each should complete with success before the next container is started.

So, you can see how this type of container is perfect for a set-up or initialization job which does not need to be a part of the actual application container. In this case, istio-init does just that and sets up the iptables rules.
istio-proxy
This is the actual sidecar proxy (based on Envoy).
Manual injection
In the manual injection method, you can use istioctl to modify the pod template and add the configuration of the two containers previously mentioned. For both manual as well as automatic injection, Istio takes the configuration from the istio-sidecar-injector configuration map (configmap) and the mesh’s istio configmap.
Let’s look at the configuration of the istio-sidecar-injector configmap, to get an idea of what actually is going on.
$ kubectl -n istio-system get configmap istio-sidecar-injector -o=jsonpath='{.data.config}'
SNIPPET from the output:

policy: enabled
template: |-
  initContainers:
  - name: istio-init
    image: docker.io/istio/proxy_init:1.0.2
    args:
    - "-p"
    - [[ .MeshConfig.ProxyListenPort ]]
    - "-u"
    - 1337
    .....
    imagePullPolicy: IfNotPresent
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
    restartPolicy: Always

  containers:
  - name: istio-proxy
    image: [[ if (isset .ObjectMeta.Annotations "sidecar.istio.io/proxyImage") -]]
    "[[ index .ObjectMeta.Annotations "sidecar.istio.io/proxyImage" ]]"
    [[ else -]]
    docker.io/istio/proxyv2:1.0.2
    [[ end -]]
    args:
    - proxy
    - sidecar
    .....
    env:
    .....
    - name: ISTIO_META_INTERCEPTION_MODE
      value: [[ or (index .ObjectMeta.Annotations "sidecar.istio.io/interceptionMode") .ProxyConfig.InterceptionMode.String ]]
    imagePullPolicy: IfNotPresent
    securityContext:
      readOnlyRootFilesystem: true
      [[ if eq (or (index .ObjectMeta.Annotations "sidecar.istio.io/interceptionMode") .ProxyConfig.InterceptionMode.String) "TPROXY" -]]
      capabilities:
        add:
        - NET_ADMIN
    restartPolicy: Always
    .....
As you can see, the configmap contains the configuration for both, the istio-init init container and the istio-proxy proxy container. The configuration includes the name of the container image and arguments like interception mode, capabilities, etc.
From a security point of view, it is important to note that istio-init requires NET_ADMIN capabilities to modify iptables within the pod’s namespace and so does istio-proxy if configured in TPROXY mode. As this is restricted to a pod’s namespace, there should be no problem. However, I have noticed that recent open-shift versions may have some issues with it and a workaround is needed. One such option is mentioned at the end of this post.
To modify the current pod template for sidecar injection, you can:
$ istioctl kube-inject -f demo-red.yaml | kubectl apply -f -
OR
To use modified configmaps or local configmaps:


Create inject-config.yaml and mesh-config.yaml from the configmaps
$ kubectl -n istio-system get configmap istio-sidecar-injector -o=jsonpath='{.data.config}' > inject-config.yaml
$ kubectl -n istio-system get configmap istio -o=jsonpath='{.data.mesh}' > mesh-config.yaml


Modify the existing pod template, in my case, demo-red.yaml:
$ istioctl kube-inject --injectConfigFile inject-config.yaml --meshConfigFile mesh-config.yaml --filename demo-red.yaml --output demo-red-injected.yaml


Apply the demo-red-injected.yaml
$ kubectl apply -f demo-red-injected.yaml


As seen above, we create a new template using the sidecar-injector and the mesh configuration to then apply that new template using kubectl. If we look at the injected YAML file, it has the configuration of the Istio-specific containers, as we discussed above. Once we apply the injected YAML file, we see two containers running. One of them is the actual application container, and the other is the istio-proxy sidecar.
$ kubectl get pods | grep demo-red
demo-red-pod-8b5df99cc-pgnl7   2/2       Running   0          3d
The count is not 3 because the istio-init container is an init type container that exits after doing what it supposed to do, which is setting up the iptable rules within the pod. To confirm the init container exit, let’s look at the output of kubectl describe:
$ kubectl describe pod demo-red-pod-8b5df99cc-pgnl7
SNIPPET from the output:

Name:               demo-red-pod-8b5df99cc-pgnl7
Namespace:          default
.....
Labels:             app=demo-red
                    pod-template-hash=8b5df99cc
                    version=version-red
Annotations:        sidecar.istio.io/status={"version":"3c0b8d11844e85232bc77ad85365487638ee3134c91edda28def191c086dc23e","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs...
Status:             Running
IP:                 10.32.0.6
Controlled By:      ReplicaSet/demo-red-pod-8b5df99cc
Init Containers:
  istio-init:
    Container ID:  docker://bef731eae1eb3b6c9d926cacb497bb39a7d9796db49cd14a63014fc1a177d95b
    Image:         docker.io/istio/proxy_init:1.0.2
    Image ID:      docker-pullable://docker.io/istio/proxy_init@sha256:e16a0746f46cd45a9f63c27b9e09daff5432e33a2d80c8cc0956d7d63e2f9185
    .....
    State:          Terminated
      Reason:       Completed
    .....
    Ready:          True
Containers:
  demo-red:
    Container ID:   docker://8cd9957955ff7e534376eb6f28b56462099af6dfb8b9bc37aaf06e516175495e
    Image:          chugtum/blue-green-image:v3
    Image ID:       docker-pullable://docker.io/chugtum/blue-green-image@sha256:274756dbc215a6b2bd089c10de24fcece296f4c940067ac1a9b4aea67cf815db
    State:          Running
      Started:      Sun, 09 Dec 2018 18:12:31 -0800
    Ready:          True
  istio-proxy:
    Container ID:  docker://ca5d690be8cd6557419cc19ec4e76163c14aed2336eaad7ebf17dd46ca188b4a
    Image:         docker.io/istio/proxyv2:1.0.2
    Image ID:      docker-pullable://docker.io/istio/proxyv2@sha256:54e206530ba6ca9b3820254454e01b7592e9f986d27a5640b6c03704b3b68332
    Args:
      proxy
      sidecar
      .....
    State:          Running
      Started:      Sun, 09 Dec 2018 18:12:31 -0800
    Ready:          True
    .....
As seen in the output, the State of the istio-init container is Terminated with the Reason being Completed. The only two containers running are the main application demo-red container and the istio-proxy container.
Automatic injection
Most of the times, you don’t want to manually inject a sidecar every time you deploy an application, using the istioctl command, but would prefer that Istio automatically inject the sidecar to your pod. This is the recommended approach and for it to work, all you need to do is to label the namespace where you are deploying the app with istio-injection=enabled.
Once labeled, Istio injects the sidecar automatically for any pod you deploy in that namespace. In the following example, the sidecar gets automatically injected in the deployed pods in the istio-dev namespace.
$ kubectl get namespaces --show-labels
NAME           STATUS    AGE       LABELS
default        Active    40d       
istio-dev      Active    19d       istio-injection=enabled
istio-system   Active    24d       
kube-public    Active    40d       
kube-system    Active    40d       
But how does this work? To get to the bottom of this, we need to understand Kubernetes admission controllers.
From Kubernetes documentation:

    
        
            
        
        An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. You can define two types of admission webhooks, validating admission Webhook and mutating admission webhook. With validating admission Webhooks, you may reject requests to enforce custom admission policies. With mutating admission Webhooks, you may change requests to enforce custom defaults.
    


For automatic sidecar injection, Istio relies on Mutating Admission Webhook. Let’s look at the details of the  istio-sidecar-injector mutating webhook configuration.
$ kubectl get mutatingwebhookconfiguration istio-sidecar-injector -o yaml
SNIPPET from the output:

apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{},"labels":{"app":"istio-sidecar-injector","chart":"sidecarInjectorWebhook-1.0.1","heritage":"Tiller","release":"istio-remote"},"name":"istio-sidecar-injector","namespace":""},"webhooks":[{"clientConfig":{"caBundle":"","service":{"name":"istio-sidecar-injector","namespace":"istio-system","path":"/inject"}},"failurePolicy":"Fail","name":"sidecar-injector.istio.io","namespaceSelector":{"matchLabels":{"istio-injection":"enabled"}},"rules":[{"apiGroups":[""],"apiVersions":["v1"],"operations":["CREATE"],"resources":["pods"]}]}]}
  creationTimestamp: 2018-12-10T08:40:15Z
  generation: 2
  labels:
    app: istio-sidecar-injector
    chart: sidecarInjectorWebhook-1.0.1
    heritage: Tiller
    release: istio-remote
  name: istio-sidecar-injector
  .....
webhooks:
- clientConfig:
    service:
      name: istio-sidecar-injector
      namespace: istio-system
      path: /inject
  name: sidecar-injector.istio.io
  namespaceSelector:
    matchLabels:
      istio-injection: enabled
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
This is where you can see the webhook namespaceSelector label that is matched for sidecar injection with the label istio-injection: enabled. In this case, you also see the operations and resources for which this is done when the pods are created. When an apiserver receives a request that matches one of the rules, the apiserver sends an admission review request to the webhook service as specified in the clientConfig:configuration with the name: istio-sidecar-injector key-value pair. We should be able to see that this service is running in the istio-system namespace.
$ kubectl get svc --namespace=istio-system | grep sidecar-injector
istio-sidecar-injector   ClusterIP   10.102.70.184           443/TCP             24d
This configuration ultimately does pretty much the same as we saw in manual injection. Just that it is done automatically during pod creation, so you won’t see the change in the deployment. You need to use kubectl describe to see the sidecar proxy and the init proxy.
The automatic sidecar injection not only depends on the namespaceSelector mechanism of the webhook, but also on the default injection policy and the per-pod override annotation.
If you look at the istio-sidecar-injector ConfigMap again, it has the default injection policy defined. In our case, it is enabled by default.
$ kubectl -n istio-system get configmap istio-sidecar-injector -o=jsonpath='{.data.config}'
SNIPPET from the output:

policy: enabled
template: |-
  initContainers:
  - name: istio-init
    image: "gcr.io/istio-release/proxy_init:1.0.2"
    args:
    - "-p"
    - [[ .MeshConfig.ProxyListenPort ]]
You can also use the annotation sidecar.istio.io/inject in the pod template to override the default policy. The following example disables the automatic injection of the sidecar for the pods in a Deployment.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ignored
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
      - name: ignored
        image: tutum/curl
        command: ["/bin/sleep","infinity"]
This example shows there are many variables, based on whether the automatic sidecar injection is controlled in your namespace, ConfigMap, or pod and they are:

webhooks namespaceSelector (istio-injection: enabled)
default policy (Configured in the ConfigMap istio-sidecar-injector)
per-pod override annotation (sidecar.istio.io/inject)

The injection status table shows a clear picture of the final injection status based on the value of the above variables.
Traffic flow from application container to sidecar proxy
Now that we are clear about how a sidecar container and an init container are injected into an application manifest, how does the sidecar proxy grab the inbound and outbound traffic to and from the container? We did briefly mention that it is done by setting up the iptable rules within the pod namespace, which in turn is done by the istio-init container. Now, it is time to verify what actually gets updated within the namespace.
Let’s get into the application pod namespace we deployed in the previous section and look at the configured iptables. I am going to show an example using nsenter. Alternatively, you can enter the container in a privileged mode to see the same information. For folks without access to the nodes, using exec to get into the sidecar and running iptables is more practical.
$ docker inspect b8de099d3510 --format '{{ .State.Pid }}'
4125
$ nsenter -t 4215 -n iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N ISTIO_INBOUND
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-N ISTIO_REDIRECT
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 80 -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -j ISTIO_REDIRECT
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
The output above clearly shows that all the incoming traffic to port 80, which is the port our red-demo application is listening, is now REDIRECTED to port 15001, which is the port that the istio-proxy, an Envoy proxy,  is listening. The same holds true for the outgoing traffic.
This brings us to the end of this post. I hope it helped to de-mystify how Istio manages to inject the sidecar proxies into an existing deployment and how Istio routes the traffic to the proxy.

    
        
            
        
        Update: In place of istio-init, there now seems to be an option of using the new CNI, which removes the need for the init container and associated privileges. This istio-cni plugin sets up the pods’ networking to fulfill this requirement in place of the current Istio injected pod istio-init approach.
    





Sidestepping Dependency Ordering with AppSwitch
Mon, 14 Jan 2019 00:00:00 +0000
We are going through an interesting cycle of application decomposition and recomposition.  While the microservice paradigm is driving monolithic applications to be broken into separate individual services, the service mesh approach is helping them to be connected back together into well-structured applications.  As such, microservices are logically separate but not independent.  They are usually closely interdependent and taking them apart introduces many new concerns such as need for mutual authentication between services.  Istio directly addresses most of those issues.
Dependency ordering problem
An issue that arises due to application decomposition and one that Istio doesn’t address is dependency ordering – bringing up individual services of an application in an order that guarantees that the application as a whole comes up quickly and correctly.  In a monolithic application, with all its components built-in, dependency ordering between the components is enforced by internal locking mechanisms.  But with individual services potentially scattered across the cluster in a service mesh, starting a service first requires checking that the services it depends on are up and available.
Dependency ordering is deceptively nuanced with a host of interrelated problems.  Ordering individual services requires having the dependency graph of the services so that they can be brought up starting from leaf nodes back to the root nodes.  It is not easy to construct such a graph and keep it updated over time as interdependencies evolve with the behavior of the application.  Even if the dependency graph is somehow provided, enforcing the ordering itself is not easy.  Simply starting the services in the specified order obviously won’t do.  A service may have started but not be ready to accept connections yet.  This is the problem with docker-compose’s depends-on tag, for example.
Apart from introducing sufficiently long sleeps between service startups, a common pattern that is often used is to check for readiness of dependencies before starting a service.  In Kubernetes, this could be done with a wait script as part of the init container of the pod.  However that means that the entire application would be held up until all its dependencies come alive.  Sometimes applications spend several minutes initializing themselves on startup before making their first outbound connection.  Not allowing a service to start at all adds substantial overhead to overall startup time of the application.  Also, the strategy of waiting on the init container won’t work for the case of multiple interdependent services within the same pod.
Example scenario: IBM WebSphere ND
Let us consider IBM WebSphere ND – a widely deployed application middleware – to grok these problems more closely.  It is a fairly complex framework in itself and consists of a central component called deployment manager (dmgr) that manages a set of node instances.  It uses UDP to negotiate cluster membership among the nodes and requires that deployment manager is up and operational before any of the node instances can come up and join the cluster.
Why are we talking about a traditional application in the modern cloud-native context?  It turns out that there are significant gains to be had by enabling them to run on the Kubernetes and Istio platforms.  Essentially it’s a part of the modernization journey that allows running traditional apps alongside green-field apps on the same modern platform to facilitate interoperation between the two.  In fact, WebSphere ND is a demanding application.  It expects a consistent network environment with specific network interface attributes etc.  AppSwitch is equipped to take care of those requirements.  For the purpose of this blog however, I’ll focus on the dependency ordering requirement and how AppSwitch addresses it.
Simply deploying dmgr and node instances as pods on a Kubernetes cluster does not work.  dmgr and the node instances happen to have a lengthy initialization process that can take several minutes.  If they are all co-scheduled, the application typically ends up in a funny state.  When a node instance comes up and finds that dmgr is missing, it would take an alternate startup path.  Instead, if it had exited immediately, Kubernetes crash-loop would have taken over and perhaps the application would have come up.  But even in that case, it turns out that a timely startup is not guaranteed.
One dmgr along with its node instances is a basic deployment configuration for WebSphere ND.  Applications like IBM Business Process Manager that are built on top of WebSphere ND running in production environments include several other services.  In those configurations, there could be a chain of interdependencies.  Depending on the applications hosted by the node instances, there may be an ordering requirement among them as well.  With long service initialization times and crash-loop restarts, there is little chance for the application to start in any reasonable length of time.
Sidecar dependency in Istio
Istio itself is affected by a version of the dependency ordering problem.  Since connections into and out of a service running under Istio are redirected through its sidecar proxy, an implicit dependency is created between the application service and its sidecar.  Unless the sidecar is fully operational, all requests from and to the service get dropped.
Dependency ordering with AppSwitch
So how do we go about addressing these issues?  One way is to defer it to the applications and say that they are supposed to be “well behaved” and implement appropriate logic to make themselves immune to startup order issues.  However, many applications (especially traditional ones) either timeout or deadlock if misordered.  Even for new applications, implementing one off logic for each service is substantial additional burden that is best avoided.  Service mesh needs to provide adequate support around these problems.  After all, factoring out common patterns into an underlying framework is really the point of service mesh.
AppSwitch explicitly addresses dependency ordering.  It sits on the control path of the application’s network interactions between clients and services in a cluster and knows precisely when a service becomes a client by making the connect call and when a particular service becomes ready to accept connections by making the listen call.  It’s service router component disseminates information about these events across the cluster and arbitrates interactions among clients and servers.  That is how AppSwitch implements functionality such as load balancing and isolation in a simple and efficient manner.  Leveraging the same strategic location of the application’s network control path, it is conceivable that the connect and listen calls made by those services can be lined up at a finer granularity rather than coarsely sequencing entire services as per a dependency graph.  That would effectively solve the multilevel dependency problem and speedup application startup.
But that still requires a dependency graph.  A number of products and tools exist to help with discovering service dependencies.  But they are typically based on passive monitoring of network traffic and cannot provide the information beforehand for any arbitrary application.  Network level obfuscation due to encryption and tunneling also makes them unreliable.  The burden of discovering and specifying the dependencies ultimately falls to the developer or the operator of the application.  As it is, even consistency checking a dependency specification is itself quite complex and any way to avoid requiring a dependency graph would be most desirable.
The point of a dependency graph is to know which clients depend on a particular service so that those clients can then be made to wait for the respective service to become live.  But does it really matter which specific clients?  Ultimately one tautology that always holds is that all clients of a service have an implicit dependency on the service.  That’s what AppSwitch leverages to get around the requirement.  In fact, that sidesteps dependency ordering altogether.  All services of the application can be co-scheduled without regard to any startup order.  Interdependencies among them automatically work themselves out at the granularity of individual requests and responses, resulting in quick and correct application startups.
AppSwitch model and constructs
Now that we have a conceptual understanding of AppSwitch’s high-level approach, let’s look at the constructs involved.  But first a quick summary of the usage model is in order.  Even though it is written for a different context, reviewing my earlier blog on this topic would be useful as well.  For completeness, let me also note AppSwitch doesn’t bother with non-network dependencies.  For example it may be possible for two services to interact using IPC mechanisms or through the shared file system.  Processes with deep ties like that are typically part of the same service anyway and don’t require framework’s intervention for ordering.
At its core, AppSwitch is built on a mechanism that allows instrumenting the BSD socket API and other related calls like fcntl and ioctl that deal with sockets.  As interesting as the details of its implementation are, it’s going to distract us from the main topic, so I’d just summarize the key properties that distinguish it from other implementations.  (1) It’s fast.  It uses a combination of seccomp filtering and binary instrumentation to aggressively limit intervening with application’s normal execution.  AppSwitch is particularly suited for service mesh and application networking use cases given that it implements those features without ever having to actually touch the data.  In contrast, network level approaches incur per-packet cost.  Take a look at this blog for some of the performance measurements.  (2) It doesn’t require any kernel support, kernel module or a patch and works on standard distro kernels (3) It can run as regular user (no root).  In fact, the mechanism can even make it possible to run Docker daemon without root by removing root requirement to network containers (4) It doesn’t require any changes to the applications whatsoever and works for any type of application – from WebSphere ND and SAP to custom C apps to statically linked Go apps.  Only requirement at this point is Linux/x86.
Decoupling services from their references
AppSwitch is built on the fundamental premise that applications should be decoupled from their references.  The identity of applications is traditionally derived from the identity of the host on which they run. However, applications and hosts are very different objects that need to be referenced independently.  Detailed discussion around this topic along with a conceptual foundation of AppSwitch is presented in this research paper.
The central AppSwitch construct that achieves the decoupling between services objects and their identities is service reference (reference, for short).  AppSwitch implements service references based on the API instrumentation mechanism outlined above.  A service reference consists of an IP:port pair (and optionally a DNS name) and a label-selector that selects the service represented by the reference and the clients to which this reference applies.  A reference supports a few key properties.  (1) It can be named independently of the name of the object it refers to.  That is, a service may be listening on an IP and port but a reference allows that service to be reached on any other IP and port chosen by the user.  This is what allows AppSwitch to run traditional applications captured from their source environments with static IP configurations to run on Kubernetes by providing them with necessary IP addresses and ports regardless of the target network environment.  (2) It remains unchanged even if the location of the target service changes.  A reference automatically redirects itself as its label-selector now resolves to the new instance of the service (3) Most important for this discussion, a reference remains valid even as the target service is coming up.
To facilitate discovering services that can be accessed through service references, AppSwitch provides an auto-curated service registry.  The registry is automatically kept up to date as services come and go across the cluster based on the network API that AppSwitch tracks.  Each entry in the registry consists of the IP and port where the respective service is bound.  Along with that, it includes a set of labels indicating the application to which this service belongs, the IP and port that the application passed through the socket API when creating the service, the IP and port where AppSwitch actually bound the service on the underlying host on behalf of the application etc.  In addition, applications created under AppSwitch carry a set of labels passed by the user that describe the application together with a few default system labels indicating the user that created the application and the host where the application is running etc.  These labels are all available to be expressed in the label-selector carried by a service reference.  A service in the registry can be made accessible to clients by creating a service reference.  A client would then be able to reach the service at the reference’s name (IP:port).  Now let’s look at how AppSwitch guarantees that the reference remains valid even when the target service has not yet come up.
Non-blocking requests
AppSwitch leverages the semantics of the BSD socket API to ensure that service references appear valid from the perspective of clients as corresponding services come up.  When a client makes a blocking connect call to another service that has not yet come up, AppSwitch blocks the call for a certain time waiting for the target service to become live.  Since it is known that the target service is a part of the application and is expected to come up shortly, making the client block rather than returning an error such as ECONNREFUSED prevents the application from failing to start.  If the service doesn’t come up within time, an error is returned to the application so that framework-level mechanisms like Kubernetes crash-loop can kick in.
If the client request is marked as non-blocking, AppSwitch handles that by returning EAGAIN to inform the application to retry rather than give up.  Once again, that is in-line with the semantics of socket API and prevents failures due to startup races.  AppSwitch essentially enables the retry logic already built into applications in support of the BSD socket API to be transparently repurposed for dependency ordering.
Application timeouts
What if the application times out based on its own internal timer?  Truth be told, AppSwitch can also fake application’s perception of time if wanted but that would be overstepping and actually unnecessary.  Application decides and knows best how long it should wait and it’s not appropriate for AppSwitch to mess with that.  Application timeouts are conservatively long and if the target service still hasn’t come up in time, it is unlikely to be a dependency ordering issue.  There must be something else going on that should not be masked.
Wildcard service references for sidecar dependency
Service references can be used to address the Istio sidecar dependency issue mentioned earlier.  AppSwitch allows the IP:port specified as part of a service reference to be a wildcard.  That is, the service reference IP address can be a netmask indicating the IP address range to be captured.  If the label selector of the service reference points to the sidecar service, then all outgoing connections of any application for which this service reference is applied, will be transparently redirected to the sidecar.  And of course, the service reference remains valid while sidecar is still coming up and the race is removed.
Using service references for sidecar dependency ordering also implicitly redirects application’s connections to the sidecar without requiring iptables and attendant privilege issues.  Essentially it works as if the application is directly making connections to the sidecar rather than the target destination, leaving the sidecar in charge of what to do.  AppSwitch would interject metadata about the original destination etc. into the data stream of the connection using the proxy protocol that the sidecar could decode before passing the connection through to the application.  Some of these details were discussed here.  That takes care of outbound connections but what about incoming connections?  With all services and their sidecars running under AppSwitch, any incoming connections that would have come from remote nodes would be redirected to their respective remote sidecars.  So nothing special to do about incoming connections.
Summary
Dependency ordering is a pesky problem. This is mostly due to lack of access to fine-grain application-level events around inter-service interactions.  Addressing this problem would have normally required applications to implement their own internal logic.  But AppSwitch makes those internal application events to be instrumented without requiring application changes.  AppSwitch then leverages the ubiquitous support for the BSD socket API to sidestep the requirement of ordering dependencies.
Acknowledgements
Thanks to Eric Herness and team for their insights and support with IBM WebSphere and BPM products as we modernized them onto the Kubernetes platform and to Mandar Jog, Martin Taillefer and Shriram Rajagopalan for reviewing early drafts of this blog.



Deploy a Custom Ingress Gateway Using Cert-Manager
Thu, 10 Jan 2019 00:00:00 +0000
This post provides instructions to manually create a custom ingress gateway with automatic provisioning of certificates based on cert-manager.
The creation of custom ingress gateway could be used in order to have different loadbalancer in order to isolate traffic.
Before you begin

Set up Istio by following the instructions in the
Installation guide.
Set up cert-manager with helm chart
We will use demo.mydemo.com for our example,
it must be resolved with your DNS

Configuring the custom ingress gateway


Check if cert-manager was installed using Helm with the following command:
$ helm ls
The output should be similar to the example below and show cert-manager with a STATUS of DEPLOYED:
NAME   REVISION UPDATED                  STATUS   CHART                     APP VERSION   NAMESPACE
istio     1     Thu Oct 11 13:34:24 2018 DEPLOYED istio-1.0.X               1.0.X         istio-system
cert      1     Wed Oct 24 14:08:36 2018 DEPLOYED cert-manager-v0.6.0-dev.2 v0.6.0-dev.2  istio-system


To create the cluster’s issuer, apply the following configuration:

    
        
            
        
        Change the cluster’s issuer provider with your own configuration values. The example uses the values under route53.
    


apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-demo
  namespace: kube-system
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: 
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-demo
    dns01:
      # Here we define a list of DNS-01 providers that can solve DNS challenges
      providers:
      - name: your-dns
        route53:
          accessKeyID: 
          region: eu-central-1
          secretAccessKeySecretRef:
            name: prod-route53-credentials-secret
            key: secret-access-key


If you use the route53 provider, you must provide a secret to perform DNS ACME Validation. To create the secret, apply the following configuration file:
apiVersion: v1
kind: Secret
metadata:
  name: prod-route53-credentials-secret
type: Opaque
data:
  secret-access-key: 


Create your own certificate:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: demo-certificate
  namespace: istio-system
spec:
  acme:
    config:
    - dns01:
        provider: your-dns
      domains:
      - '*.mydemo.com'
  commonName: '*.mydemo.com'
  dnsNames:
  - '*.mydemo.com'
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-demo
  secretName: istio-customingressgateway-certs
Make a note of the value of secretName since a future step requires it.


To scale automatically, declare a new horizontal pod autoscaler with the following configuration:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-ingressgateway
  namespace: istio-system
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: my-ingressgateway
  targetCPUUtilizationPercentage: 80
status:
  currentCPUUtilizationPercentage: 0
  currentReplicas: 1
  desiredReplicas: 1


Apply your deployment with declaration provided in the yaml definition

    
        
            
        
        The annotations used, for example aws-load-balancer-type, only apply for AWS.
    




Create your service:

    
        
            
        
        The NodePort used needs to be an available port.
    


apiVersion: v1
kind: Service
metadata:
  name: my-ingressgateway
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  labels:
    app: my-ingressgateway
    istio: my-ingressgateway
spec:
  type: LoadBalancer
  selector:
    app: my-ingressgateway
    istio: my-ingressgateway
  ports:
    -
      name: http2
      nodePort: 32380
      port: 80
      targetPort: 80
    -
      name: https
      nodePort: 32390
      port: 443
    -
      name: tcp
      nodePort: 32400
      port: 31400


Create your Istio custom gateway configuration object:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  annotations:
  name: istio-custom-gateway
  namespace: default
spec:
  selector:
    istio: my-ingressgateway
  servers:
  - hosts:
    - '*.mydemo.com'
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - '*.mydemo.com'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt


Link your istio-custom-gateway with your VirtualService:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-virtualservice
spec:
  hosts:
  - "demo.mydemo.com"
  gateways:
  - istio-custom-gateway
  http:
  - route:
    - destination:
        host: my-demoapp


Correct certificate is returned by the server and it is successfully verified (SSL certificate verify ok is printed):
$ curl -v `https://demo.mydemo.com`
Server certificate:
  SSL certificate verify ok.


Congratulations! You can now use your custom istio-custom-gateway gateway configuration object.



Announcing discuss.istio.io
Thu, 10 Jan 2019 00:00:00 +0000
We in the Istio community have been working to find the right medium for users to engage with other members of the community – to ask questions,
to get help from other users, and to engage with developers working on the project.
We’ve tried several different avenues, but each has had some downsides. RocketChat was our most recent endeavor, but the lack of certain
features (for example, threading) meant it wasn’t ideal for any longer discussions around a single issue. It also led to a dilemma for
some users – when should I email istio-users@googlegroups.com and when should I use RocketChat?
We think we’ve found the right balance of features in a single platform, and we’re happy to announce
discuss.istio.io. It’s a full-featured forum where we will have discussions about Istio from here on out.
It will allow you to ask a question and get threaded replies! As a real bonus, you can use your GitHub identity.
If you prefer emails, you can configure it to send emails just like Google groups did.
We will be marking our Google groups “read only” so that the content remains, but we ask you to send further questions over to
discuss.istio.io. If you have any outstanding questions or discussions in the groups, please move the conversation over.
Happy meshing!



Incremental Istio Part 1, Traffic Management
Wed, 21 Nov 2018 00:00:00 +0000
Traffic management is one of the critical benefits provided by Istio. At the heart of Istio’s traffic management is the ability to decouple traffic flow and infrastructure scaling. This lets you control your traffic in ways that aren’t possible without a service mesh like Istio.
For example, let’s say you want to execute a canary deployment. With Istio, you can specify that v1 of a service receives 90% of incoming traffic, while v2 of that service only receives 10%. With standard Kubernetes deployments, the only way to achieve this is to manually control the number of available Pods for each version, for example 9 Pods running v1 and 1 Pod running v2. This type of manual control is hard to implement, and over time may have trouble scaling. For more information, check out Canary Deployments using Istio.
The same issue exists when deploying updates to existing services. While you can update deployments with Kubernetes, it requires replacing v1 Pods with v2 Pods. Using Istio, you can deploy v2 of your service and use built-in traffic management mechanisms to shift traffic to your updated services at a network level, then remove the v1 Pods.
In addition to canary deployments and general traffic shifting, Istio also gives you the ability to implement dynamic request routing (based on HTTP headers), failure recovery, retries, circuit breakers, and fault injection. For more information, check out the Traffic Management documentation.
This post walks through a technique that highlights a particularly useful way that you can implement Istio incrementally – in this case, only the traffic management features – without having to individually update each of your Pods.
Setup: why implement Istio traffic management features?
Of course, the first question is: Why would you want to do this?
If you’re part of one of the many organizations out there that have a large cluster with lots of teams deploying, the answer is pretty clear. Let’s say Team A is getting started with Istio and wants to start some canary deployments on Service A, but Team B hasn’t started using Istio, so they don’t have sidecars deployed.
With Istio, Team A can still implement their canaries by having Service B call Service A through Istio’s ingress gateway.
Background: traffic routing in an Istio mesh
But how can you use Istio’s traffic management capabilities without updating each of your applications’ Pods to include the Istio sidecar? Before answering that question, let’s take a quick high-level look at how traffic enters an Istio mesh and how it’s routed.
Pods that are part of the Istio mesh contain a sidecar proxy that is responsible for mediating all inbound and outbound traffic to the Pod. Within an Istio mesh, Pilot is responsible for converting high-level routing rules into configurations and propagating them to the sidecar proxies. That means when services communicate with one another, their routing decisions are determined from the client side.
Let’s say you have two services that are part of the Istio mesh, Service A and Service B. When A wants to communicate with B, the sidecar proxy of Pod A is responsible for directing traffic to Service B. For example, if you wanted to split traffic 50/50 across Service B v1 and v2, the traffic would flow as follows:

    
        
            
        
    
    50/50 Traffic Split

If Services A and B are not part of the Istio mesh, there is no sidecar proxy that knows how to route traffic to different versions of Service B. In that case you need to use another approach to get traffic from Service A to Service B, following the 50/50 rules you’ve set up.
Fortunately, a standard Istio deployment already includes a Gateway that specifically deals with ingress traffic outside of the Istio mesh. This Gateway is used to allow ingress traffic from outside the cluster via an external load balancer, or to allow ingress traffic from within the Kubernetes cluster but outside the service mesh. It can be configured to proxy incoming ingress traffic to the appropriate Pods, even if they don’t have a sidecar proxy. While this approach allows you to leverage Istio’s traffic management features, it does mean that traffic going through the ingress gateway will incur an extra hop.

    
        
            
        
    
    50/50 Traffic Split using Ingress Gateway

In action: traffic routing with Istio
A simple way to see this type of approach in action is to first set up your Kubernetes environment using the Platform Setup instructions, and then install the minimal Istio profile using Helm, including only the traffic management components (ingress gateway, egress gateway, Pilot). The following example uses Google Kubernetes Engine.
First, set up and configure GKE:
$ gcloud container clusters create istio-inc --zone us-central1-f
$ gcloud container clusters get-credentials istio-inc
$ kubectl create clusterrolebinding cluster-admin-binding \
   --clusterrole=cluster-admin \
   --user=$(gcloud config get-value core/account)
Next, install Helm and generate a minimal Istio install – only traffic management components:
$ helm template install/kubernetes/helm/istio \
  --name istio \
  --namespace istio-system \
  --set security.enabled=false \
  --set galley.enabled=false \
  --set sidecarInjectorWebhook.enabled=false \
  --set mixer.enabled=false \
  --set prometheus.enabled=false \
  --set pilot.sidecar=false > istio-minimal.yaml
Then create the istio-system namespace and deploy Istio:
$ kubectl create namespace istio-system
$ kubectl apply -f istio-minimal.yaml
Next, deploy the Bookinfo sample without the Istio sidecar containers:
Zip$ kubectl apply -f @samples/bookinfo/platform/kube/bookinfo.yaml@
Now, configure a new Gateway that allows access to the reviews service from outside the Istio mesh, a new VirtualService that splits traffic evenly between v1 and v2 of the reviews service, and a set of new DestinationRule resources that match destination subsets to service versions:
$ cat <

Finally, deploy a pod that you can use for testing with curl (and without the Istio sidecar container):
Zip$ kubectl apply -f @samples/sleep/sleep.yaml@
Testing your deployment
Now, you can test different behaviors using the curl commands via the sleep Pod.
The first example is to issue requests to the reviews service using standard Kubernetes service DNS behavior (note: jq is used in the examples below to filter the output from curl):
$ export SLEEP_POD=$(kubectl get pod -l app=sleep \
  -o jsonpath={.items..metadata.name})
$ for i in `seq 3`; do \
  kubectl exec -it $SLEEP_POD curl http://reviews:9080/reviews/0 | \
  jq '.reviews|.[]|.rating?'; \
  done
{
  "stars": 5,
  "color": "black"
}
{
  "stars": 4,
  "color": "black"
}
null
null
{
  "stars": 5,
  "color": "red"
}
{
  "stars": 4,
  "color": "red"
}
Notice how we’re getting responses from all three versions of the reviews service (null is from reviews v1 which doesn’t have ratings) and not getting the even split across v1 and v2. This is expected behavior because the curl command is using Kubernetes service load balancing across all three versions of the reviews service. In order to access the reviews 50/50 split we need to access the service via the ingress Gateway:
$ for i in `seq 4`; do \
  kubectl exec -it $SLEEP_POD curl http://istio-ingressgateway.istio-system/reviews/0 | \
  jq '.reviews|.[]|.rating?'; \
  done
{
  "stars": 5,
  "color": "black"
}
{
  "stars": 4,
  "color": "black"
}
null
null
{
  "stars": 5,
  "color": "black"
}
{
  "stars": 4,
  "color": "black"
}
null
null
Mission accomplished! This post showed how to deploy a minimal installation of Istio that only contains the traffic management components (Pilot, ingress Gateway), and then use those components to direct traffic to specific versions of the reviews service. And it wasn’t necessary to deploy the Istio sidecar proxy to gain these capabilities, so there was little to no interruption of existing workloads or applications.
Using the built-in ingress gateway (along with some VirtualService and DestinationRule resources) this post showed how you can easily leverage Istio’s traffic management for cluster-external ingress traffic and cluster-internal service-to-service traffic. This technique is a great example of an incremental approach to adopting Istio, and can be especially useful in real-world cases where Pods are owned by different teams or deployed to different namespaces.



Consuming External MongoDB Services
Fri, 16 Nov 2018 00:00:00 +0000
In the Consuming External TCP Services blog post, I described how external services
can be consumed by in-mesh Istio applications via TCP. In this post, I demonstrate consuming external MongoDB services.
You use the Istio Bookinfo sample application, the version in which the book
ratings data is persisted in a MongoDB database. You deploy this database outside the cluster and configure the
ratings microservice to use it. You will learn multiple options of controlling traffic to external MongoDB services and their
pros and cons.
Bookinfo with external ratings database
First, you set up a MongoDB database instance to hold book ratings data outside of your Kubernetes cluster. Then you
modify the Bookinfo sample application to use your database.
Setting up the ratings database
For this task you set up an instance of MongoDB. You can use any MongoDB instance; I used
Compose for MongoDB.


Set an environment variable for the password of your admin user. To prevent the password from being preserved in
the Bash history, remove the command from the history immediately after running the command, using
history -d.
$ export MONGO_ADMIN_PASSWORD=


Set an environment variable for the password of the new user you will create, namely bookinfo.
Remove the command from the history using
history -d.
$ export BOOKINFO_PASSWORD=


Set environment variables for your MongoDB service, MONGODB_HOST and MONGODB_PORT.


Create the bookinfo user:
$ cat <



Create a collection to hold ratings. The following command sets both ratings to be equal 1 to provide a visual
clue when your database is used by the Bookinfo ratings service (the default Bookinfo ratings are 4 and 5).
$ cat <



Check that bookinfo user can get ratings:
$ cat <

The output should be similar to:
MongoDB server version: 3.4.10
switched to db test
{ "_id" : ObjectId("5b7c29efd7596e65b6ed2572"), "rating" : 1 }
{ "_id" : ObjectId("5b7c29efd7596e65b6ed2573"), "rating" : 1 }
bye


Initial setting of Bookinfo application
To demonstrate the scenario of using an external database, you start with a Kubernetes cluster with Istio installed. Then you deploy the
Istio Bookinfo sample application, apply the default destination rules, and
change Istio to the blocking-egress-by-default policy.
This application uses the ratings microservice to fetch book ratings, a number between 1 and 5. The ratings are
displayed as stars for each review. There are several versions of the ratings microservice. You will deploy the
version that uses MongoDB as the ratings database in the next subsection.
The example commands in this blog post work with Istio 1.0.
As a reminder, here is the end-to-end architecture of the application from the
Bookinfo sample application.

    
        
            
        
    
    The original Bookinfo application

Use the external database in Bookinfo application


Deploy the spec of the ratings microservice that uses a MongoDB database (ratings v2):
Zip$ kubectl apply -f @samples/bookinfo/platform/kube/bookinfo-ratings-v2.yaml@
serviceaccount "bookinfo-ratings-v2" created
deployment "ratings-v2" created


Update the MONGO_DB_URL environment variable to the value of your MongoDB:
$ kubectl set env deployment/ratings-v2 "MONGO_DB_URL=mongodb://bookinfo:$BOOKINFO_PASSWORD@$MONGODB_HOST:$MONGODB_PORT/test?authSource=test&ssl=true"
deployment.extensions/ratings-v2 env updated


Route all the traffic destined to the reviews service to its v3 version. You do this to ensure that the
reviews service always calls the ratings service. In addition, route all the traffic destined to the ratings
service to ratings v2 that uses your database.
Specify the routing for both services above by adding two
virtual services. These virtual services are
specified in samples/bookinfo/networking/virtual-service-ratings-mongodb.yaml of an Istio release archive.
Important: make sure you
applied the default destination rules before running the
following command.
Zip$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-db.yaml@


The updated architecture appears below. Note that the blue arrows inside the mesh mark the traffic configured according
to the virtual services we added. According to the virtual services, the traffic is sent to reviews v3 and
ratings v2.

    
        
            
        
    
    The Bookinfo application with ratings v2 and an external MongoDB database

Note that the MongoDB database is outside the Istio service mesh, or more precisely outside the Kubernetes cluster. The
boundary of the service mesh is marked by a dashed line.
Access the webpage
Access the webpage of the application, after
determining the ingress IP and port.
Since you did not configure the egress traffic control yet, the access to the MongoDB service is blocked by Istio.
This is why instead of the rating stars, the message “Ratings service is currently unavailable” is currently
displayed below each review:

    
        
            
        
    
    The Ratings service error messages

In the following sections you will configure egress access to the external MongoDB service, using different options for
egress control in Istio.
Egress control for TCP
Since MongoDB Wire Protocol runs on top of TCP, you
can control the egress traffic to your MongoDB as traffic to any other external TCP service. To
control TCP traffic, a block of IPs in the CIDR notation that includes the IP
address of your MongoDB host must be specified. The caveat here is that sometimes the IP of the MongoDB host is not
stable or known in advance.
In the cases when the IP of the MongoDB host is not stable, the egress traffic can either be
controlled as TLS traffic, or the traffic can be routed
directly, bypassing the Istio sidecar
proxies.
Get the IP address of your MongoDB database instance. As an option, you can use the
host command:
$ export MONGODB_IP=$(host $MONGODB_HOST | grep " has address " | cut -d" " -f4)
Control TCP egress traffic without a gateway
In case you do not need to direct the traffic through an
egress gateway, for example if you do not have a
requirement that all the traffic that exists your mesh must exit through the gateway, follow the
instructions in this section. Alternatively, if you do want to direct your traffic through an egress gateway, proceed to
Direct TCP egress traffic through an egress gateway.


Define a TCP mesh-external service entry:
$ kubectl apply -f - <

Note that the protocol TCP is specified instead of MONGO due to the fact that the traffic can be encrypted in
case the MongoDB protocol runs on top of TLS.
If the traffic is encrypted, the encrypted MongoDB protocol cannot be parsed by the Istio proxy.
If you know that the plain MongoDB protocol is used, without encryption, you can specify the protocol as MONGO and
let the Istio proxy produce
MongoDB related statistics.
Also note that when the protocol TCP is specified, the configuration is not specific for MongoDB, but is the same
for any other database with the protocol on top of TCP.
Note that the host of your MongoDB is not used in TCP routing, so you can use any host, for example my-mongo.tcp.svc. Notice the STATIC resolution and the endpoint with the IP of your MongoDB service. Once you define such an endpoint, you can access MongoDB services that do not have a domain name.


Refresh the web page of the application. Now the application should display the ratings without error:

        
            
                
            
        
        Book Ratings Displayed Correctly
    
Note that you see a one-star rating for both displayed reviews, as expected. You set the ratings to be one star to
provide yourself with a visual clue that your external database is indeed being used.


If you want to direct the traffic through an egress gateway, proceed to the next section. Otherwise, perform
cleanup.


Direct TCP Egress traffic through an egress gateway
In this section you handle the case when you need to direct the traffic through an
egress gateway. The sidecar proxy routes TCP
connections from the MongoDB client to the egress gateway, by matching the IP of the MongoDB host (a CIDR block of
length 32). The egress gateway forwards the traffic to the MongoDB host, by its hostname.


Deploy Istio egress gateway.


If you did not perform the steps in the previous section, perform them now.


You may want to enable mutual TLS Authentication between the sidecar proxies of
your MongoDB clients and the egress gateway to let the egress gateway monitor the identity of the source pods and to
enable Mixer policy enforcement based on that identity. By enabling mutual TLS you also encrypt the traffic.
If you do not want to enable mutual TLS, proceed to the Mutual TLS between the sidecar proxies and the egress gateway section.
Otherwise, proceed to the following section.


Configure TCP traffic from sidecars to the egress gateway


Define the EGRESS_GATEWAY_MONGODB_PORT environment variable to hold some port for directing traffic through
the egress gateway, e.g. 7777. You must select a port that is not used for any other service in the mesh.
$ export EGRESS_GATEWAY_MONGODB_PORT=7777


Add the selected port to the istio-egressgateway service. You should use the same values you used for installing
Istio, in particular you have to specify all the ports of the istio-egressgateway service that you previously
configured.
$ helm template install/kubernetes/helm/istio/ --name istio-egressgateway --namespace istio-system -x charts/gateways/templates/deployment.yaml -x charts/gateways/templates/service.yaml --set gateways.istio-ingressgateway.enabled=false --set gateways.istio-egressgateway.enabled=true --set gateways.istio-egressgateway.ports[0].port=80 --set gateways.istio-egressgateway.ports[0].name=http --set gateways.istio-egressgateway.ports[1].port=443 --set gateways.istio-egressgateway.ports[1].name=https --set gateways.istio-egressgateway.ports[2].port=$EGRESS_GATEWAY_MONGODB_PORT --set gateways.istio-egressgateway.ports[2].name=mongo | kubectl apply -f -


Check that the istio-egressgateway service indeed has the selected port:
$ kubectl get svc istio-egressgateway -n istio-system
NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
istio-egressgateway   ClusterIP   172.21.202.204           80/TCP,443/TCP,7777/TCP   34d


Disable mutual TLS authentication for the istio-egressgateway service:
$ kubectl apply -f - <



Create an egress Gateway for your MongoDB service, and destination rules and a virtual service to direct the
traffic through the egress gateway and from the egress gateway to the external service.
$ kubectl apply -f - <



Verify that egress traffic is directed through the egress gateway.


Mutual TLS between the sidecar proxies and the egress gateway


Delete the previous configuration:
$ kubectl delete gateway istio-egressgateway --ignore-not-found=true
$ kubectl delete virtualservice direct-mongo-through-egress-gateway --ignore-not-found=true
$ kubectl delete destinationrule egressgateway-for-mongo mongo --ignore-not-found=true
$ kubectl delete policy istio-egressgateway -n istio-system --ignore-not-found=true


Enforce mutual TLS authentication for the istio-egressgateway service:
$ kubectl apply -f - <



Create an egress Gateway for your MongoDB service, and destination rules and a virtual service
to direct the traffic through the egress gateway and from the egress gateway to the external service.
$ kubectl apply -f - <



Proceed to the next section.


Verify that egress traffic is directed through the egress gateway


Refresh the web page of the application again and verify that the ratings are still displayed correctly.


Enable Envoy’s access logging


Check the log of the egress gateway’s Envoy and see a line that corresponds to your
requests to the MongoDB service. If Istio is deployed in the istio-system namespace, the command to print the
log is:
$ kubectl logs -l istio=egressgateway -n istio-system
[2019-04-14T06:12:07.636Z] "- - -" 0 - "-" 1591 4393 94 - "-" "-" "-" "-" ":" outbound|||my-mongo.tcp.svc 172.30.146.119:59924 172.30.146.119:443 172.30.230.1:59206 -


Cleanup of TCP egress traffic control
$ kubectl delete serviceentry mongo
$ kubectl delete gateway istio-egressgateway --ignore-not-found=true
$ kubectl delete virtualservice direct-mongo-through-egress-gateway --ignore-not-found=true
$ kubectl delete destinationrule egressgateway-for-mongo mongo --ignore-not-found=true
$ kubectl delete policy istio-egressgateway -n istio-system --ignore-not-found=true
Egress control for TLS
In the real life, most of the communication to the external services must be encrypted and
the MongoDB protocol runs on top of TLS.
Also, the TLS clients usually send
Server Name Indication, SNI, as part of their handshake. If your
MongoDB server runs TLS and your MongoDB client sends SNI as part of the handshake, you can control your MongoDB egress
traffic as any other TLS-with-SNI traffic. With TLS and SNI, you do not need to specify the IP addresses of your MongoDB
servers. You specify their host names instead, which is more convenient since you do not have to rely on the stability of
the IP addresses. You can also specify wildcards as a prefix of the host names, for example allowing access to any
server from the *.com domain.
To check if your MongoDB server supports TLS, run:
$ openssl s_client -connect $MONGODB_HOST:$MONGODB_PORT -servername $MONGODB_HOST
If the command above prints a certificate returned by the server, the server supports TLS. If not, you have to control
your MongoDB egress traffic on the TCP level, as described in the previous sections.
Control TLS egress traffic without a gateway
In case you do not need an egress gateway, follow the
instructions in this section. If you want to direct your traffic through an egress gateway, proceed to
Direct TCP Egress traffic through an egress gateway.


Create a ServiceEntry for the MongoDB service:
$ kubectl apply -f - <



Refresh the web page of the application. The application should display the ratings without error.


Cleanup of the egress configuration for TLS
$ kubectl delete serviceentry mongo
Direct TLS Egress traffic through an egress gateway
In this section you handle the case when you need to direct the traffic through an
egress gateway. The sidecar proxy routes TLS
connections from the MongoDB client to the egress gateway, by matching the SNI of the MongoDB host.
The egress gateway forwards the traffic to the MongoDB host. Note that the sidecar proxy rewrites the destination port
to be 443. The egress gateway accepts the MongoDB traffic on the port 443, matches the MongoDB host by SNI, and rewrites
the port again to be the port of the MongoDB server.


Deploy Istio egress gateway.


Create a ServiceEntry for the MongoDB service:
$ kubectl apply -f - <



Refresh the web page of the application and verify that the ratings are displayed correctly.


Create an egress Gateway for your MongoDB service, and destination rules and virtual services
to direct the traffic through the egress gateway and from the egress gateway to the external service.
If you want to enable mutual TLS Authentication between the sidecar proxies of
your application pods and the egress gateway, use the following command. (You may want to enable mutual TLS to let
the egress gateway monitor the identity of the source pods and to enable Mixer policy enforcement based on that
identity.)


    
    
                $ kubectl apply -f - <


                $ kubectl apply -f - <






Verify that the traffic is directed though the egress gateway


Cleanup directing TLS egress traffic through an egress gateway
$ kubectl delete serviceentry mongo
$ kubectl delete gateway istio-egressgateway
$ kubectl delete virtualservice direct-mongo-through-egress-gateway
$ kubectl delete destinationrule egressgateway-for-mongo
Enable MongoDB TLS egress traffic to arbitrary wildcarded domains
Sometimes you want to configure egress traffic to multiple hostnames from the same domain, for example traffic to all
MongoDB services from *..com. You do not want to create multiple configuration items, one for
each and every MongoDB service in your company. To configure access to all the external services from the same domain by
a single configuration, you use wildcarded hosts.
In this section you configure egress traffic for a wildcarded domain. I used a MongoDB instance at composedb.com
domain, so configuring egress traffic for *.com worked for me (I could have used *.composedb.com as well).
You can pick a wildcarded domain according to your MongoDB host.
To configure egress gateway traffic for a wildcarded domain, you will first need to deploy a custom egress
gateway with
an additional SNI proxy.
This is needed due to current limitations of Envoy, the proxy used by the standard Istio egress gateway.
Prepare a new egress gateway with an SNI proxy
In this subsection you deploy an egress gateway with an SNI proxy, in addition to the standard Istio Envoy proxy. You
can use any SNI proxy that is capable of routing traffic according to arbitrary, not-preconfigured SNI values; we used
Nginx to achieve this functionality.


Create a configuration file for the Nginx SNI proxy. You may want to edit the file to specify additional Nginx
settings, if required.
$ cat < ./sni-proxy.conf
user www-data;

events {
}

stream {
  log_format log_stream '\$remote_addr [\$time_local] \$protocol [\$ssl_preread_server_name]'
  '\$status \$bytes_sent \$bytes_received \$session_time';

  access_log /var/log/nginx/access.log log_stream;
  error_log  /var/log/nginx/error.log;

  # tcp forward proxy by SNI
  server {
    resolver 8.8.8.8 ipv6=off;
    listen       127.0.0.1:$MONGODB_PORT;
    proxy_pass   \$ssl_preread_server_name:$MONGODB_PORT;
    ssl_preread  on;
  }
}
EOF


Create a Kubernetes ConfigMap
to hold the configuration of the Nginx SNI proxy:
$ kubectl create configmap egress-sni-proxy-configmap -n istio-system --from-file=nginx.conf=./sni-proxy.conf


The following command will generate istio-egressgateway-with-sni-proxy.yaml to edit and deploy.
$ cat < ./istio-egressgateway-with-sni-proxy.yaml
gateways:
  enabled: true
  istio-ingressgateway:
    enabled: false
  istio-egressgateway:
    enabled: false
  istio-egressgateway-with-sni-proxy:
    enabled: true
    labels:
      app: istio-egressgateway-with-sni-proxy
      istio: egressgateway-with-sni-proxy
    replicaCount: 1
    autoscaleMin: 1
    autoscaleMax: 5
    cpu:
      targetAverageUtilization: 80
    serviceAnnotations: {}
    type: ClusterIP
    ports:
      - port: 443
        name: https
    secretVolumes:
      - name: egressgateway-certs
        secretName: istio-egressgateway-certs
        mountPath: /etc/istio/egressgateway-certs
      - name: egressgateway-ca-certs
        secretName: istio-egressgateway-ca-certs
        mountPath: /etc/istio/egressgateway-ca-certs
    configVolumes:
      - name: sni-proxy-config
        configMapName: egress-sni-proxy-configmap
    additionalContainers:
    - name: sni-proxy
      image: nginx
      volumeMounts:
      - name: sni-proxy-config
        mountPath: /etc/nginx
        readOnly: true
EOF


Deploy the new egress gateway:
$ kubectl apply -f ./istio-egressgateway-with-sni-proxy.yaml
serviceaccount "istio-egressgateway-with-sni-proxy-service-account" created
role "istio-egressgateway-with-sni-proxy-istio-system" created
rolebinding "istio-egressgateway-with-sni-proxy-istio-system" created
service "istio-egressgateway-with-sni-proxy" created
deployment "istio-egressgateway-with-sni-proxy" created
horizontalpodautoscaler "istio-egressgateway-with-sni-proxy" created


Verify that the new egress gateway is running. Note that the pod has two containers (one is the Envoy proxy and the
second one is the SNI proxy).
$ kubectl get pod -l istio=egressgateway-with-sni-proxy -n istio-system
NAME                                                  READY     STATUS    RESTARTS   AGE
istio-egressgateway-with-sni-proxy-79f6744569-pf9t2   2/2       Running   0          17s


Create a service entry with a static address equal to 127.0.0.1 (localhost), and disable mutual TLS on the traffic directed to the new
service entry:
$ kubectl apply -f - <



Configure access to *.com using the new egress gateway


Define a ServiceEntry for *.com:
$ cat <



Create an egress Gateway for *.com, port 443, protocol TLS, a destination rule to set the
SNI for the gateway, and Envoy filters to prevent tampering
with SNI by a malicious application (the filters verify that the SNI issued by the application is the SNI reported
to Mixer).
$ kubectl apply -f - <



Route the traffic destined for *.com to the egress gateway and from the egress gateway to the SNI proxy.
$ kubectl apply -f - <



Refresh the web page of the application again and verify that the ratings are still displayed correctly.


Enable Envoy’s access logging


Check the log of the egress gateway’s Envoy proxy. If Istio is deployed in the istio-system namespace, the command
to print the log is:
$ kubectl logs -l istio=egressgateway-with-sni-proxy -c istio-proxy -n istio-system
You should see lines similar to the following:
[2019-01-02T17:22:04.602Z] "- - -" 0 - 768 1863 88 - "-" "-" "-" "-" "127.0.0.1:28543" outbound|28543||sni-proxy.local 127.0.0.1:49976 172.30.146.115:443 172.30.146.118:58510 
[2019-01-02T17:22:04.713Z] "- - -" 0 - 1534 2590 85 - "-" "-" "-" "-" "127.0.0.1:28543" outbound|28543||sni-proxy.local 127.0.0.1:49988 172.30.146.115:443 172.30.146.118:58522 


Check the logs of the SNI proxy. If Istio is deployed in the istio-system namespace, the command to print the
log is:
$ kubectl logs -l istio=egressgateway-with-sni-proxy -n istio-system -c sni-proxy
127.0.0.1 [23/Aug/2018:03:28:18 +0000] TCP []200 1863 482 0.089
127.0.0.1 [23/Aug/2018:03:28:18 +0000] TCP []200 2590 1248 0.095


Understanding what happened
In this section you configured egress traffic to your MongoDB host using a wildcarded domain. While for a single MongoDB
host there is no gain in using wildcarded domains (an exact hostname can be specified), it could be beneficial for
cases when the applications in the cluster access multiple MongoDB hosts that match some wildcarded domain. For example,
if the applications access mongodb1.composedb.com, mongodb2.composedb.com and mongodb3.composedb.com, the egress
traffic can be configured by a single configuration for the wildcarded domain *.composedb.com.
I will leave it as an exercise for the reader to verify that no additional Istio configuration is required when you
configure an app to use another instance of MongoDB with a hostname that matches the wildcarded domain used in this
section.
Cleanup of configuration for MongoDB TLS egress traffic to arbitrary wildcarded domains


Delete the configuration items for *.com:
$ kubectl delete serviceentry mongo
$ kubectl delete gateway istio-egressgateway-with-sni-proxy
$ kubectl delete virtualservice direct-mongo-through-egress-gateway
$ kubectl delete destinationrule mtls-for-egress-gateway
$ kubectl delete envoyfilter forward-downstream-sni egress-gateway-sni-verifier


Delete the configuration items for the egressgateway-with-sni-proxy deployment:
$ kubectl delete serviceentry sni-proxy
$ kubectl delete destinationrule disable-mtls-for-sni-proxy
$ kubectl delete -f ./istio-egressgateway-with-sni-proxy.yaml
$ kubectl delete configmap egress-sni-proxy-configmap -n istio-system


Remove the configuration files you created:
$ rm ./istio-egressgateway-with-sni-proxy.yaml
$ rm ./nginx-sni-proxy.conf


Cleanup


Drop the bookinfo user:
$ cat <



Drop the ratings collection:
$ cat <



Unset the environment variables you used:
$ unset MONGO_ADMIN_PASSWORD BOOKINFO_PASSWORD MONGODB_HOST MONGODB_PORT MONGODB_IP


Remove the virtual services:
Zip$ kubectl delete -f @samples/bookinfo/networking/virtual-service-ratings-db.yaml@
Deleted config: virtual-service/default/reviews
Deleted config: virtual-service/default/ratings


Undeploy ratings v2-mongodb:
Zip$ kubectl delete -f @samples/bookinfo/platform/kube/bookinfo-ratings-v2.yaml@
deployment "ratings-v2" deleted


Conclusion
In this blog post I demonstrated various options for MongoDB egress traffic control. You can control the MongoDB egress
traffic on a TCP or TLS level where applicable. In both TCP and TLS cases, you can direct the traffic from the sidecar
proxies directly to the external MongoDB host, or direct the traffic through an egress gateway, according to your
organization’s security requirements. In the latter case, you can also decide to apply or disable mutual TLS
authentication between the sidecar proxies and the egress gateway. If you want to control MongoDB egress traffic on the
TLS level by specifying wildcarded domains like *.com and you need to direct the traffic through the egress gateway,
you must deploy a custom egress gateway with an SNI proxy.
Note that the configuration and considerations described in this blog post for MongoDB are rather the same for other
non-HTTP protocols on top of TCP/TLS.



All Day Istio Twitch Stream
Fri, 03 Aug 2018 00:00:00 +0000
To celebrate the 1.0 release and to promote the software to a wider audience, the Istio community is hosting an all day live stream on Twitch on August 17th.
What is Twitch?
Twitch is a popular video gaming live streaming platform and recently has seen a lot of coding content showing up. The IBM Advocates have been doing live coding and presentations there and it’s been fun. While mostly used for gaming content, there is a growing community sharing and watching programming content on the site.
What does this have to do with Istio?
The stream is going to be a full day of Istio content. Hopefully we’ll have a good mix of deep technical content, beginner content and line-of-business content for our audience. We’ll have developers, users, and evangelists on throughout the day to share their demos and stories. Expect live coding, q and a, and some surprises. We have stellar guests lined up from IBM, Google, Datadog, Pivotal, and more!
Recordings
Recordings are available here.
Schedule
All times are PDT.

  
      
          Time
          Speaker
          Affiliation
      
  
  
      
          10:00 - 10:30
          Spencer Krum + Lisa-Marie Namphy
          IBM / Portworx
      
      
          10:30 - 11:00
          Lin Sun / Spencer Krum / Sven Mawson
          IBM / Google
      
      
          11:00 - 11:10
          Lin Sun / Spencer Krum
          IBM
      
      
          11:10 - 11:30
          Jason Yee / Ilan Rabinovich
          Datadog
      
      
          11:30 - 11:50
          April Nassl
          Google
      
      
          11:50 - 12:10
          Spike Curtis
          Tigera
      
      
          12:10 - 12:30
          Shannon Coen
          Pivotal
      
      
          12:30 - 1:00
          Matt Klein
          Lyft
      
      
          1:00 - 1:20
          Zach Jory
          F5/Aspen Mesh
      
      
          1:20 - 1:40
          Dan Ciruli
          Google
      
      
          1:40 - 2:00
          Isaiah Snell-Feikema / Greg Hanson
          IBM
      
      
          2:00 - 2:20
          Zach Butcher
          Tetrate
      
      
          2:20 - 2:40
          Ray Hudaihed
          American Airlines
      
      
          2:40 - 3:00
          Christian Posta
          Red Hat
      
      
          3:00 - 3:20
          Google/IBM China
          Google / IBM
      
      
          3:20 - 3:40
          Colby Dyess
          Tuffin
      
      
          3:40 - 4:00
          Rohit Agarwalla
          Cisco
      
  




Istio a Game Changer for HP's FitStation Platform
Tue, 31 Jul 2018 00:00:00 +0000
The FitStation team at HP strongly believes in the future of Kubernetes, BPF and service-mesh as the next standards in cloud infrastructure. We are also very happy to see Istio coming to its official Istio 1.0 release – thanks to the joint collaboration that started at Google, IBM and Lyft beginning in May 2017.
Throughout the development of FitStation’s large scale and progressive cloud platform, Istio, Cilium and Kubernetes technologies have delivered a multitude of opportunities to make our systems more robust and scalable. Istio was a game changer in creating reliable and dynamic network communication.
FitStation powered by HP is a technology platform that captures 3D biometric data to design personalized footwear to perfectly fit individual foot size and shape as well as gait profile. It uses 3D scanning, pressure sensing, 3D printing and variable density injection molding to create unique footwear. Footwear brands such as Brooks, Steitz Secura or Superfeet are connecting to FitStation to build their next generation of high performance sports, professional and medical shoes.
FitStation is built on the promise of ultimate security and privacy for users’ biometric data. ISTIO is the cornerstone to make that possible for data-at-flight within our cloud. By managing these aspects at the infrastructure level, we focused on solving business problems instead of spending time on individual implementations of secure service communication. Using Istio allowed us to dramatically reduce the complexity of maintaining a multitude of libraries and services to provide secure service communication.
As a bonus benefit of Istio 1.0, we gained network visibility, metrics and tracing out of the box. This radically improved decision-making and response quality for our development
and devops teams. The team got in-depth insight in the network communication across the entire platform, both for new as well as legacy applications. The integration of Cilium
with Envoy delivered a remarkable performance benefit on Istio service mesh communication, combined with a fine-grained kernel driven L7 network security layer. This was due to the powers of BPF brought to Istio by Cilium. We believe this will drive the future of Linux kernel security.
It has been very exciting to follow Istio’s growth. We have been able to see clear improvements of performance and stability over the different development versions. The improvements between version 0.7 and 0.8 made our teams feel comfortable with version 1.0, we can state that Istio is now ready for real production usage.
We are looking forward to the promising roadmaps of Istio, Envoy, Cilium and CNCF.



Delayering Istio with AppSwitch
Mon, 30 Jul 2018 00:00:00 +0000

    
        
            
        
        All problems in computer science can be solved with another layer, except of course the problem of too many layers. – David Wheeler

        
    


The sidecar proxy approach enables a lot of awesomeness.  Squarely in the datapath between microservices, the sidecar can precisely tell what the application is trying to do.  It can monitor and instrument protocol traffic, not in the bowels of the networking layers but at the application level, to enable deep visibility, access controls and traffic management.
If we look closely however, there are many intermediate layers that the data has to pass through before the high-value analysis of application-traffic can be performed.  Most of those layers are part of the base plumbing infrastructure that are there just to push the data along.  In doing so, they add latency to communication and complexity to the overall system.
Over the years, there has been much collective effort in implementing aggressive fine-grained optimizations within the layers of the network datapath.  Each iteration may shave another few microseconds.  But then the true necessity of those layers itself has not been questioned.
Don’t optimize layers, remove them
In my belief, optimizing something is a poor fallback to removing its requirement altogether.  That was the goal of my initial work (broken link: https://apporbit.com/a-brief-history-of-containers-from-reality-to-hype/) on OS-level virtualization that led to Linux containers which effectively removed virtual machines by running applications directly on the host operating system without requiring an intermediate guest.  For a long time the industry was fighting the wrong battle distracted by optimizing VMs rather than removing the additional layer altogether.
I see the same pattern repeat itself with the connectivity of microservices, and networking in general.  The network has been going through the changes that physical servers have gone through a decade earlier.  New set of layers and constructs are being introduced.  They are being baked deep into the protocol stack and even silicon without adequately considering low-touch alternatives.  Perhaps there is a way to remove those additional layers altogether.
I have been thinking about these problems for some time and believe that an approach similar in concept to containers can be applied to the network stack that would fundamentally simplify how application endpoints are connected across the complexity of many intermediate layers.  I have reapplied the same principles from the original work on containers to create AppSwitch.  Similar to the way containers provide an interface that applications can directly consume, AppSwitch plugs directly into well-defined and ubiquitous network API that applications currently use and directly connects application clients to appropriate servers, skipping all intermediate layers.  In the end, that’s what networking is all about.
Before going into the details of how AppSwitch promises to remove unnecessary layers from the Istio stack, let me give a very brief introduction to its architecture.  Further details are available at the documentation page.
AppSwitch
Not unlike the container runtime, AppSwitch consists of a client and a daemon that speak over HTTP via a REST API.  Both the client and the daemon are built as one self-contained binary, ax.  The client transparently plugs into the application and tracks its system calls related to network connectivity and notifies the daemon about their occurrences.  As an example, let’s say an application makes the connect(2) system call to the service IP of a Kubernetes service.  The AppSwitch client intercepts the connect call, nullifies it and notifies the daemon about its occurrence along with some context that includes the system call arguments.  The daemon would then handle the system call, potentially by directly connecting to the Pod IP of the upstream server on behalf of the application.
It is important to note that no data is forwarded between AppSwitch client and daemon.  They are designed to exchange file descriptors (FDs) over a Unix domain socket to avoid having to copy data.  Note also that client is not a separate process.  Rather it directly runs in the context of the application itself.  There is no data copy between the application and AppSwitch client either.
Delayering the stack
Now that we have an idea about what AppSwitch does, let’s look at the layers that it optimizes away from a standard service mesh.
Network devirtualization
Kubernetes offers simple and well-defined network constructs to the microservice applications it runs.  In order to support them however, it imposes specific requirements on the underlying network.  Meeting those requirements is often not easy.  The go-to solution of adding another layer is typically adopted to satisfy the requirements.  In most cases the additional layer consists of a network overlay that sits between Kubernetes and underlying network.  Traffic produced by the applications is encapsulated at the source and decapsulated at the target, which not only costs network resources but also takes up compute cores.
Because AppSwitch arbitrates what the application sees through its touchpoints with the platform, it projects a consistent virtual view of the underlying network to the application similar to an overlay but without introducing an additional layer of processing along the datapath.  Just to draw a parallel to containers, the inside of a container looks and feels like a VM.  However the underlying implementation does not intervene along the high-incidence control paths of low-level interrupts etc.
AppSwitch can be injected into a standard Kubernetes manifest (similar to Istio injection) such that the application’s network is directly handled by AppSwitch bypassing any network overlay underneath.  More details to follow in just a bit.
Artifacts of container networking
Extending network connectivity from host into the container has been a major challenge.  New layers of network plumbing were invented explicitly for that purpose.  As such, an application running in a container is simply a process on the host.  However due to a fundamental misalignment between the network abstraction expected by the application and the abstraction exposed by container network namespace, the process cannot directly access the host network.  Applications think of networking in terms of sockets or sessions whereas network namespaces expose a device abstraction.  Once placed in a network namespace, the process suddenly loses all connectivity.  The notion of veth-pair and corresponding tooling were invented just to close that gap.  The data would now have to go from a host interface into a virtual switch and then through a veth-pair to the virtual network interface of the container network namespace.
AppSwitch can effectively remove both the virtual switch and veth-pair layers on both ends of the connection.  Since the connections are established by the daemon running on the host using the network that’s already available on the host, there is no need for additional plumbing to bridge host network into the container.  The socket FDs created on the host are passed to the application running within the pod’s network namespace.  By the time the application receives the FD, all control path work (security checks, connection establishment) is already done and the FD is ready for actual IO.
Skip TCP/IP for colocated endpoints
TCP/IP is the universal protocol medium over which pretty much all communication occurs.  But if application endpoints happen to be on the same host, is TCP/IP really required?  After all, it does do quite a bit of work and it is quite complex.  Unix sockets are explicitly designed for intrahost communication and AppSwitch can transparently switch the communication to occur over a Unix socket for colocated endpoints.
For each listening socket of an application, AppSwitch maintains two listening sockets, one each for TCP and Unix.  When a client tries to connect to a server that happens to be colocated, AppSwitch daemon would choose to connect to the Unix listening socket of the server.  The resulting Unix sockets on each end are passed into respective applications.  Once a fully connected FD is returned, the application would simply treat it as a bit pipe.  The protocol doesn’t really matter.  The application may occasionally make protocol specific calls such as getsockname(2) and AppSwitch would handle them in kind.  It would present consistent responses such that the application would continue to run on.
Data pushing proxy
As we continue to look for layers to remove, let us also reconsider the requirement of the proxy layer itself.  There are times when the role of the proxy may degenerate into a plain data pusher:

There may not be a need for any protocol decoding
The protocol may not be recognized by the proxy
The communication may be encrypted and the proxy cannot access relevant headers
The application (redis,  memcached etc.) may be too latency-sensitive and cannot afford the cost of an intermediate proxy

In all these cases, the proxy is not different from any low-level plumbing layer.  In fact, the latency introduced can be far higher because the same level of optimizations won’t be available to a proxy.
To illustrate this with an example, consider the application shown below.  It consists of a Python app and a set of memcached servers behind it.  An upstream memcached server is selected based on connection time routing.  Speed is the primary concern here.

    
        
            
        
    
    Latency-sensitive application scenario

If we look at the data flow in this setup, the Python app makes a connection to the service IP of memcached.  It is redirected to the client-side sidecar.  The sidecar routes the connection to one of the memcached servers and copies the data between the two sockets – one connected to the app and another connected to memcached.  And the same also occurs on the server side between the server-side sidecar and memcached.  The role of proxy at that point is just boring shoveling of bits between the two sockets.  However, it ends up adding substantial latency to the end-to-end connection.
Now let us imagine that the app is somehow made to connect directly to memcached, then the two intermediate proxies could be skipped.  The data would flow directly between the app and memcached without any intermediate hops.  AppSwitch can arrange for that by transparently tweaking the target address passed by the Python app when it makes the connect(2) system call.
Proxyless protocol decoding
Things are going to get a bit strange here.  We have seen that the proxy can be bypassed for cases that don’t involve looking into application traffic.  But is there anything we can do even for those other cases?  It turns out, yes.
In a typical communication between microservices, much of the interesting information is exchanged in the initial headers.  Headers are followed by body or payload which typically represents bulk of the communication.  And once again the proxy degenerates into a data pusher for this part of communication.  AppSwitch provides a nifty mechanism to skip proxy for these cases.
Even though AppSwitch is not a proxy, it does arbitrate connections between application endpoints and it does have access to corresponding socket FDs.  Normally, AppSwitch simply passes those FDs to the application.  But it can also peek into the initial message received on the connection using the MSG_PEEK option of the recvfrom(2) system call on the socket.  It allows AppSwitch to examine application traffic without actually removing it from the socket buffers.  When AppSwitch returns the FD to the application and steps out of the datapath, the application would do an actual read on the connection.  AppSwitch uses this technique to perform deeper analysis of application-level traffic and implement sophisticated network functions as discussed in the next section, all without getting into the datapath.
Zero-cost load balancer, firewall and network analyzer
Typical implementations of network functions such as load balancers and firewalls require an intermediate layer that needs to tap into data/packet stream.  Kubernetes’ implementation of load balancer (kube-proxy) for example introduces a probe into the packet stream through iptables and Istio implements the same at the proxy layer.  But if all that is required is to redirect or drop connections based on policy, it is not really necessary to stay in the datapath during the entire course of the connection.  AppSwitch can take care of that much more efficiently by simply manipulating the control path at the API level.  Given its intimate proximity to the application, AppSwitch also has easy access to various pieces of application level metrics such as dynamics of stack and heap usage, precisely when a service comes alive, attributes of active connections etc., all of which could potentially form a rich signal for monitoring and analytics.
To go a step further, AppSwitch can also perform L7 load balancing and firewall functions based on the protocol data that it obtains from the socket buffers.  It can synthesize the protocol data and various other signals with the policy information acquired from Pilot to implement a highly efficient form of routing and access control enforcement.  It can essentially “influence” the application to connect to the right backend server without requiring any changes to the application or its configuration.  It is as if the application itself is infused with policy and traffic-management intelligence.  Except in this case, the application can’t escape the influence.
There is some more black-magic possible that would actually allow modifying the application data stream without getting into the datapath but I am going to save that for a later post.  Current implementation of AppSwitch uses a proxy if the use case requires application protocol traffic to be modified.  For those cases, AppSwitch provides a highly optimal mechanism to attract traffic to the proxy as discussed in the next section.
Traffic redirection
Before the sidecar proxy can look into application protocol traffic, it needs to first receive the connections.  Redirection of connections coming into and going out of the application is currently done by a layer of packet filtering that rewrites packets such that they go to respective sidecars.  Creating potentially large number of rules required to represent the redirection policy is tedious.  And the process of applying the rules and updating them, as the target subnets to be captured by the sidecar change, is expensive.
While some of the performance concerns are being addressed by the Linux community, there is another concern related to privilege: iptables rules need to be updated whenever the policy changes.  Given the current architecture, all privileged operations are performed in an init container that runs just once at the very beginning before privileges are dropped for the actual application.  Since updating iptables rules requires root privileges, there is no way to do that without restarting the application.
AppSwitch provides a way to redirect application connections without root privilege.  As such, an unprivileged application is already able to connect to any host (modulo firewall rules etc.) and the owner of the application should be allowed to change the host address passed by its application via connect(2) without requiring additional privilege.
Socket delegation
Let’s see how AppSwitch could help redirect connections without using iptables.  Imagine that the application somehow voluntarily passes the socket FDs that it uses for its communication to the sidecar, then there would be no need for iptables.  AppSwitch provides a feature called socket delegation that does exactly that.  It allows the sidecar to transparently gain access to copies of socket FDs that the application uses for its communication without any changes to the application itself.
Here are the sequence of steps that would achieve this in the context of the Python application example.

The application initiates a connection request to the service IP of memcached service.
The connection request from client is forwarded to the daemon.
The daemon creates a pair of pre-connected Unix sockets (using socketpair(2) system call).
It passes one end of the socket pair into the application such that the application would use that socket FD for read/write.  It also ensures that the application consistently sees it as a legitimate TCP socket as it expects by interposing all calls that query connection properties.
The other end is passed to sidecar over a different Unix socket where the daemon exposes its API.  Information such as the original destination that the application was connecting to is also conveyed over the same interface.


    
        
            
        
    
    Socket delegation based connection redirection

Once the application and sidecar are connected, the rest happens as usual.  Sidecar would initiate a connection to upstream server and proxy data between the socket received from the daemon and the socket connected to upstream server.  The main difference here is that sidecar would get the connection, not through the accept(2) system call as it is in the normal case, but from the daemon over the Unix socket.  In addition to listening for connections from applications through the normal accept(2) channel, the sidecar proxy would connect to the AppSwitch daemon’s REST endpoint and receive sockets that way.
For completeness, here are the sequence of steps that would occur on the server side:

The application receives a connection
AppSwitch daemon accepts the connection on behalf of the application
It creates a pair of pre-connected Unix sockets using socketpair(2) system call
One end of the socket pair is returned to the application through the accept(2) system call
The other end of the socket pair along with the socket originally accepted by the daemon on behalf of the application is sent to sidecar
Sidecar would extract the two socket FDs – a Unix socket FD connected to the application and a TCP socket FD connected to the remote client
Sidecar would read the metadata supplied by the daemon about the remote client and perform its usual operations

“Sidecar-aware” applications
Socket delegation feature can be very useful for applications that are explicitly aware of the sidecar and wish to take advantage of its features.  They can voluntarily delegate their network interactions by passing their sockets to the sidecar using the same feature.  In a way, AppSwitch transparently turns every application into a sidecar-aware application.
How does it all come together?
Just to step back, Istio offloads common connectivity concerns from applications to a sidecar proxy that performs those functions on behalf of the application.  And AppSwitch simplifies and optimizes the service mesh by sidestepping intermediate layers and invoking the proxy only for cases where it is truly necessary.
In the rest of this section, I outline how AppSwitch may be integrated with Istio based on a very cursory initial implementation.  This is not intended to be anything like a design doc – not every possible way of integration is explored and not every detail is worked out.  The intent is to discuss high-level aspects of the implementation to present a rough idea of how the two systems may come together.  The key is that AppSwitch would act as a cushion between Istio and a real proxy.  It would serve as the “fast-path” for cases that can be performed more efficiently without invoking the sidecar proxy.  And for the cases where the proxy is used, it would shorten the datapath by cutting through unnecessary layers.  Look at this blog for a more detailed walk through of the integration.
AppSwitch client injection
Similar to Istio sidecar-injector, a simple tool called ax-injector injects AppSwitch client into a standard Kubernetes manifest.  Injected client transparently monitors the application and intimates AppSwitch daemon of the control path network API events that the application produces.
It is possible to not require the injection and work with standard Kubernetes manifests if AppSwitch CNI plugin is used.  In that case, the CNI plugin would perform necessary injection when it gets the initialization callback.  Using injector does have some advantages, however: (1) It works in tightly-controlled environments like GKE (2) It can be easily extended to support other frameworks such as Mesos (3) Same cluster would be able to run standard applications alongside “AppSwitch-enabled” applications.
AppSwitch DaemonSet
AppSwitch daemon can be configured to run as a DaemonSet or as an extension to the application that is directly injected into application manifest.  In either case it handles network events coming in from the applications that it supports.
Agent for policy acquisition
This is the component that conveys policy and configuration dictated by Istio to AppSwitch.  It implements xDS API to listen from Pilot and calls appropriate AppSwitch APIs to program the daemon.  For example, it allows the load balancing strategy, as specified by istioctl, to be translated into equivalent AppSwitch capability.
Platform adapter for AppSwitch “Auto-Curated” service registry
Given that AppSwitch is in the control path of applications’ network APIs, it has ready access to the topology of services across the cluster.  AppSwitch exposes that information in the form of a service registry that is automatically and (almost) synchronously updated as applications and their services come and go.  A new platform adapter for AppSwitch alongside Kubernetes, Eureka etc. would provide the details of upstream services to Istio.  This is not strictly necessary but it does make it easier to correlate service endpoints received from Pilot by AppSwitch agent above.
Proxy integration and chaining
Connections that do require deep scanning and mutation of application traffic are handed off to an external proxy through the socket delegation mechanism discussed earlier.  It uses an extended version of proxy protocol.  In addition to the simple parameters supported by the proxy protocol, a variety of other metadata (including the initial protocol headers obtained from the socket buffers) and live socket FDs (representing application connections) are forwarded to the proxy.
The proxy can look at the metadata and decide how to proceed.  It could respond by accepting the connection to do the proxying or by directing AppSwitch to allow the connection and use the fast-path or to just drop the connection.
One of the interesting aspects of the mechanism is that, when the proxy accepts a socket from AppSwitch, it can in turn delegate the socket to another proxy.  In fact that is how AppSwitch currently works.  It uses a simple built-in proxy to examine the metadata and decide whether to handle the connection internally or to hand it off to an external proxy (Envoy).  The same mechanism can be potentially extended to allow for a chain of plugins, each looking for a specific signature, with the last one in the chain doing the real proxy work.
It’s not just about performance
Removing intermediate layers along the datapath is not just about improving performance.  Performance is a great side effect, but it is a side effect.  There are a number of important advantages to an API level approach.
Automatic application onboarding and policy authoring
Before microservices and service mesh, traffic management was done by load balancers and access controls were enforced by firewalls.  Applications were identified by IP addresses and DNS names which were relatively static.  In fact, that’s still the status quo in most environments.  Such environments stand to benefit immensely from service mesh.  However a practical and scalable bridge to the new world needs to be provided.  The difficulty in transformation is not as much due to lack of features and functionality but the investment required to rethink and reimplement the entire application infrastructure.  Currently most of the policy and configuration exists in the form of load balancer and firewall rules.  Somehow that existing context needs to be leveraged in providing a scalable path to adopting the service mesh model.
AppSwitch can substantially ease the onboarding process.  It can project the same network environment to the application at the target as its current source environment.  Not having any assistance here is typically a non-starter in case of traditional applications which have complex configuration files with static IP addresses or specific DNS names hard-coded in them.  AppSwitch could help capture those applications along with their existing configuration and connect them over a service mesh without requiring any changes.
Broader application and protocol support
HTTP clearly dominates the modern application landscapes but once we talk about traditional applications and environments, we’d encounter all kinds of protocols and transports.  Particularly, support for UDP becomes unavoidable.  Traditional application servers such as IBM WebSphere rely extensively on UDP.  Most multimedia applications use UDP media streams.  Of course DNS is probably the most widely used UDP “application”.  AppSwitch supports UDP at the API level much the same way as TCP and when it detects a UDP connection, it can transparently handle it in its “fast-path” rather than delegating it to the proxy.
Client IP preservation and end-to-end principle
The same mechanism that preserves the source network environment can also preserve client IP addresses as seen by the servers.  With a sidecar proxy in place, connection requests come from the proxy rather than the client.  As a result, the peer address (IP:port) of the connection as seen by the server would be that of the proxy rather than the client.  AppSwitch ensures that the server sees correct address of the client, logs it correctly and any decisions made based on the client address remain valid.  More generally, AppSwitch preserves the end-to-end principle which is otherwise broken by intermediate layers that obfuscate the true underlying context.
Enhanced application signal with access to encrypted headers
Encrypted traffic completely undermines the ability of the service mesh to analyze application traffic.  API level interposition could potentially offer a way around it.  Current implementation of AppSwitch gains access to application’s network API at the system call level.  However it is possible in principle to influence the application at an API boundary, higher in the stack where application data is not yet encrypted or already decrypted.  Ultimately the data is always produced in the clear by the application and then encrypted at some point before it goes out.  Since AppSwitch directly runs within the memory context of the application, it is possible to tap into the data higher on the stack where it is still held in clear.  Only requirement for this to work is that the API used for encryption should be well-defined and amenable for interposition.  Particularly, it requires access to the symbol table of the application binaries.  Just to be clear, AppSwitch doesn’t implement this today.
So what’s the net?
AppSwitch removes a number of layers and processing from the standard service mesh stack.  What does all that translate to in terms of performance?
We ran some initial experiments to characterize the extent of the opportunity for optimization based on the initial integration of AppSwitch discussed earlier.  The experiments were run on GKE using fortio-0.11.0, istio-0.8.0 and appswitch-0.4.0-2.  In case of the proxyless test, AppSwitch daemon was run as a DaemonSet on the Kubernetes cluster and the Fortio pod spec was modified to inject AppSwitch client.  These were the only two changes made to the setup.  The test was configured to measure the latency of GRPC requests across 100 concurrent connections.

    
        
            
        
    
    Latency with and without AppSwitch

Initial results indicate a difference of over 18x in p50 latency with and without AppSwitch (3.99ms vs 72.96ms).  The difference was around 8x when mixer and access logs were disabled.  Clearly the difference was due to sidestepping all those intermediate layers along the datapath.  Unix socket optimization wasn’t triggered in case of AppSwitch because client and server pods were scheduled to separate hosts.  End-to-end latency of AppSwitch case would have been even lower if the client and server happened to be colocated.  Essentially the client and server running in their respective pods of the Kubernetes cluster are directly connected over a TCP socket going over the GKE network – no tunneling, bridge or proxies.
Net Net
I started out with David Wheeler’s seemingly reasonable quote that says adding another layer is not a solution for the problem of too many layers.  And I argued through most of the blog that current network stack already has too many layers and that they should be removed.  But isn’t AppSwitch itself a layer?
Yes, AppSwitch is clearly another layer.  However it is one that can remove multiple other layers.  In doing so, it seamlessly glues the new service mesh layer with existing layers of traditional network environments.  It offsets the cost of sidecar proxy and as Istio graduates to 1.0, it provides a bridge for existing applications and their network environments to transition to the new world of service mesh.
Perhaps Wheeler’s quote should read:

    
        
            
        
        All problems in computer science can be solved with another layer, even the problem of too many layers!

        
    


Acknowledgements
Thanks to Mandar Jog (Google) for several discussions about the value of AppSwitch for Istio and to the following individuals (in alphabetical order) for their review of early drafts of this blog.

Frank Budinsky (IBM)
Lin Sun (IBM)
Shriram Rajagopalan (VMware)




Micro-Segmentation with Istio Authorization
Fri, 20 Jul 2018 00:00:00 +0000
Micro-segmentation is a security technique that creates secure zones in cloud deployments and allows organizations to
isolate workloads from one another and secure them individually.
Istio’s authorization feature, also known as Istio Role Based Access Control,
provides micro-segmentation for services in an Istio mesh. It features:

Authorization at different levels of granularity, including namespace level, service level, and method level.
Service-to-service and end-user-to-service authorization.
High performance, as it is enforced natively on Envoy.
Role-based semantics, which makes it easy to use.
High flexibility as it allows users to define conditions using
combinations of attributes.

In this blog post, you’ll learn about the main authorization features and how to use them in different situations.
Characteristics
RPC level authorization
Authorization is performed at the level of individual RPCs. Specifically, it controls “who can access my bookstore service”,
or “who can access method getBook in my bookstore service”. It is not designed to control access to application-specific
resource instances, like access to “storage bucket X” or access to “3rd book on 2nd shelf”. Today this kind of application
specific access control logic needs to be handled by the application itself.
Role-based access control with conditions
Authorization is a role-based access control (RBAC) system,
contrast this to an attribute-based access control (ABAC)
system. Compared to ABAC, RBAC has the following advantages:


Roles allow grouping of attributes. Roles are groups of permissions, which specifies the actions you are allowed
to perform on a system. Users are grouped based on the roles within an organization. You can define the roles and reuse
them for different cases.


It is easier to understand and reason about who has access. The RBAC concepts map naturally to business concepts.
For example, a DB admin may have all access to DB backend services, while a web client may only be able to view the
frontend service.


It reduces unintentional errors. RBAC policies make otherwise complex security changes easier. You won’t have
duplicate configurations in multiple places and later forget to update some of them when you need to make changes.


On the other hand, Istio’s authorization system is not a traditional RBAC system. It also allows users to define conditions using
combinations of attributes. This gives Istio
flexibility to express complex access control policies. In fact, the “RBAC + conditions” model
that Istio authorization adopts, has all the benefits an RBAC system has, and supports the level of flexibility that
normally an ABAC system provides. You’ll see some examples below.
High performance
Because of its simple semantics, Istio authorization is enforced on Envoy as a native authorization support. At runtime, the
authorization decision is completely done locally inside an Envoy filter, without dependency to any external module.
This allows Istio authorization to achieve high performance and availability.
Work with/without primary identities
Like any other RBAC system, Istio authorization is identity aware. In Istio authorization policy, there is a primary
identity called user, which represents the principal of the client.
In addition to the primary identity, you can also specify any conditions that define the identities. For example,
you can specify the client identity as “user Alice calling from Bookstore frontend service”, in which case,
you have a combined identity of the calling service (Bookstore frontend) and the end user (Alice).
To improve security, you should enable authentication features,
and use authenticated identities in authorization policies. However, strongly authenticated identity is not required
for using authorization. Istio authorization works with or without identities. If you are working with a legacy system,
you may not have mutual TLS or JWT authentication setup for your mesh. In this case, the only way to identify the client is, for example,
through IP. You can still use Istio authorization to control which IP addresses or IP ranges are allowed to access your service.
Examples
The authorization task shows you how to
use Istio’s authorization feature to control namespace level and service level access using the
Bookinfo application. In this section, you’ll see more examples on how to achieve
micro-segmentation with Istio authorization.
Namespace level segmentation via RBAC + conditions
Suppose you have services in the frontend and backend namespaces. You would like to allow all your services
in the frontend namespace to access all services that are marked external in the backend namespace.
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: external-api-caller
  namespace: backend
spec:
  rules:
  - services: ["*"]
    methods: ["*”]
    constraints:
    - key: "destination.labels[visibility]”
      values: ["external"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: external-api-caller
  namespace: backend
spec:
  subjects:
  - properties:
      source.namespace: "frontend”
  roleRef:
    kind: ServiceRole
    name: "external-api-caller"
The ServiceRole and ServiceRoleBinding above expressed “who is allowed to do what under which conditions”
(RBAC + conditions). Specifically:

“who” are the services in the frontend namespace.
“what” is to call services in backend namespace.
“conditions” is the visibility label of the destination service having the value external.

Service/method level isolation with/without primary identities
Here is another example that demonstrates finer grained access control at service/method level. The first step
is to define a book-reader service role that allows READ access to /books/* resource in bookstore service.
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: book-reader
  namespace: default
spec:
  rules:
  - services: ["bookstore.default.svc.cluster.local"]
    paths: ["/books/*”]
    methods: ["GET”]
Using authenticated client identities
Suppose you want to grant this book-reader role to your bookstore-frontend service. If you have enabled
mutual TLS authentication for your mesh, you can use a
service account to identify your bookstore-frontend service. Granting the book-reader role to the bookstore-frontend
service can be done by creating a ServiceRoleBinding as shown below:
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: book-reader
  namespace: default
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/bookstore-frontend”
  roleRef:
    kind: ServiceRole
    name: "book-reader"
You may want to restrict this further by adding a condition that “only users who belong to the qualified-reviewer group are
allowed to read books”. The qualified-reviewer group is the end user identity that is authenticated by
JWT authentication. In this case, the combination of the client service identity
(bookstore-frontend) and the end user identity (qualified-reviewer) is used in the authorization policy.
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: book-reader
  namespace: default
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/bookstore-frontend"
    properties:
      request.auth.claims[group]: "qualified-reviewer"
  roleRef:
    kind: ServiceRole
    name: "book-reader"
Client does not have identity
Using authenticated identities in authorization policies is strongly recommended for security. However, if you have a
legacy system that does not support authentication, you may not have authenticated identities for your services.
You can still use Istio authorization to protect your services even without authenticated identities. The example below
shows that you can specify allowed source IP range in your authorization policy.
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: book-reader
  namespace: default
spec:
  subjects:
  - properties:
      source.ip: 10.20.0.0/9
  roleRef:
    kind: ServiceRole
    name: "book-reader"
Summary
Istio’s authorization feature provides authorization at namespace-level, service-level, and method-level granularity.
It adopts “RBAC + conditions” model, which makes it easy to use and understand as an RBAC system, while providing the level of
flexibility that an ABAC system normally provides. Istio authorization achieves high performance as it is enforced
natively on Envoy. While it provides the best security by working together with
Istio authentication features, Istio authorization can also be used to
provide access control for legacy systems that do not have authentication.



Exporting Logs to BigQuery, GCS, Pub/Sub through Stackdriver
Mon, 09 Jul 2018 00:00:00 +0000
This post shows how to direct Istio logs to Stackdriver
and export those logs to various configured sinks such as such as
BigQuery, Google Cloud Storage
or Cloud Pub/Sub. At the end of this post you can perform
analytics on Istio data from your favorite places such as BigQuery, GCS or Cloud Pub/Sub.
The Bookinfo sample application is used as the example
application throughout this task.
Before you begin
Install Istio in your cluster and deploy an application.
Configuring Istio to export logs
Istio exports logs using the logentry template.
This specifies all the variables that are available for analysis. It
contains information like source service, destination service, auth
metrics (coming..) among others. Following is a diagram of the pipeline:

    
        
            
        
    
    Exporting logs from Istio to Stackdriver for analysis

Istio supports exporting logs to Stackdriver which can in turn be configured to export
logs to your favorite sink like BigQuery, Pub/Sub or GCS. Please follow the steps
below to set up your favorite sink for exporting logs first and then Stackdriver
in Istio.
Setting up various log sinks
Common setup for all sinks:

Enable Stackdriver Monitoring API for the project.
Make sure principalEmail that would be setting up the sink has write access to the project and Logging Admin role permissions.
Make sure the GOOGLE_APPLICATION_CREDENTIALS environment variable is set. Please follow instructions here to set it up.

BigQuery

Create a BigQuery dataset as a destination for the logs export.
Record the ID of the dataset. It will be needed to configure the Stackdriver handler.
It would be of the form bigquery.googleapis.com/projects/[PROJECT_ID]/datasets/[DATASET_ID]
Give sink’s writer identity: cloud-logs@system.gserviceaccount.com BigQuery Data Editor role in IAM.
If using Google Kubernetes Engine, make sure bigquery Scope is enabled on the cluster.

Google Cloud Storage (GCS)

Create a GCS bucket where you would like logs to get exported in GCS.
Recode the ID of the bucket. It will be needed to configure Stackdriver.
It would be of the form storage.googleapis.com/[BUCKET_ID]
Give sink’s writer identity: cloud-logs@system.gserviceaccount.com Storage Object Creator role in IAM.

Google Cloud Pub/Sub

Create a topic where you would like logs to get exported in Google Cloud Pub/Sub.
Recode the ID of the topic. It will be needed to configure Stackdriver.
It would be of the form pubsub.googleapis.com/projects/[PROJECT_ID]/topics/[TOPIC_ID]
Give sink’s writer identity: cloud-logs@system.gserviceaccount.com Pub/Sub Publisher role in IAM.
If using Google Kubernetes Engine, make sure pubsub Scope is enabled on the cluster.

Setting up Stackdriver
A Stackdriver handler must be created to export data to Stackdriver. The configuration for
a Stackdriver handler is described here.


Save the following yaml file as stackdriver.yaml. Replace , , ,  with their specific values.
apiVersion: "config.istio.io/v1alpha2"
kind: stackdriver
metadata:
  name: handler
  namespace: istio-system
spec:
  # We'll use the default value from the adapter, once per minute, so we don't need to supply a value.
  # pushInterval: 1m
  # Must be supplied for the Stackdriver adapter to work
  project_id: ""
  # One of the following must be set; the preferred method is `appCredentials`, which corresponds to
  # Google Application Default Credentials.
  # If none is provided we default to app credentials.
  # appCredentials:
  # apiKey:
  # serviceAccountPath:
  # Describes how to map Istio logs into Stackdriver.
  logInfo:
    accesslog.logentry.istio-system:
      payloadTemplate: '{{or (.sourceIp) "-"}} - {{or (.sourceUser) "-"}} [{{or (.timestamp.Format "02/Jan/2006:15:04:05 -0700") "-"}}] "{{or (.method) "-"}} {{or (.url) "-"}} {{or (.protocol) "-"}}" {{or (.responseCode) "-"}} {{or (.responseSize) "-"}}'
      httpMapping:
        url: url
        status: responseCode
        requestSize: requestSize
        responseSize: responseSize
        latency: latency
        localIp: sourceIp
        remoteIp: destinationIp
        method: method
        userAgent: userAgent
        referer: referer
      labelNames:
      - sourceIp
      - destinationIp
      - sourceService
      - sourceUser
      - sourceNamespace
      - destinationIp
      - destinationService
      - destinationNamespace
      - apiClaims
      - apiKey
      - protocol
      - method
      - url
      - responseCode
      - responseSize
      - requestSize
      - latency
      - connectionMtls
      - userAgent
      - responseTimestamp
      - receivedBytes
      - sentBytes
      - referer
      sinkInfo:
        id: ''
        destination: ''
        filter: ''
---
apiVersion: "config.istio.io/v1alpha2"
kind: rule
metadata:
  name: stackdriver
  namespace: istio-system
spec:
  match: "true" # If omitted match is true.
  actions:
  - handler: handler.stackdriver
    instances:
    - accesslog.logentry
---


Push the configuration
$ kubectl apply -f stackdriver.yaml
stackdriver "handler" created
rule "stackdriver" created
logentry "stackdriverglobalmr" created
metric "stackdriverrequestcount" created
metric "stackdriverrequestduration" created
metric "stackdriverrequestsize" created
metric "stackdriverresponsesize" created


Send traffic to the sample application.
For the Bookinfo sample, visit http://$GATEWAY_URL/productpage in your web
browser or issue the following command:
$ curl http://$GATEWAY_URL/productpage


Verify that logs are flowing through Stackdriver to the configured sink.

Stackdriver: Navigate to the Stackdriver Logs
Viewer for your project
and look under “GKE Container” -> “Cluster Name” -> “Namespace Id” for
Istio Access logs.
BigQuery: Navigate to the BigQuery
Interface for your project and you
should find a table with prefix accesslog_logentry_istio in your sink
dataset.
GCS: Navigate to the Storage
Browser for your
project and you should find a bucket named
accesslog.logentry.istio-system in your sink bucket.
Pub/Sub: Navigate to the Pub/Sub
Topic List for
your project and you should find a topic for accesslog in your sink
topic.



Understanding what happened
Stackdriver.yaml file above configured Istio to send access logs to
Stackdriver and then added a sink configuration where these logs could be
exported. In detail as follows:


Added a handler of kind stackdriver
apiVersion: "config.istio.io/v1alpha2"
kind: stackdriver
metadata:
  name: handler
  namespace: 


Added logInfo in spec
spec:
  logInfo: accesslog.logentry.istio-system:
    labelNames:
    - sourceIp
    - destinationIp
    ...
    ...
    sinkInfo:
      id: ''
      destination: ''
      filter: ''
In the above configuration sinkInfo contains information about the sink where you want
the logs to get exported to. For more information on how this gets filled for different sinks please refer
here.


Added a rule for Stackdriver
apiVersion: "config.istio.io/v1alpha2"
kind: rule
metadata:
  name: stackdriver
  namespace: istio-system spec:
  match: "true" # If omitted match is true
actions:
- handler: handler.stackdriver
  instances:
  - accesslog.logentry


Cleanup


Remove the new Stackdriver configuration:
$ kubectl delete -f stackdriver.yaml


If you are not planning to explore any follow-on tasks, refer to the
Bookinfo cleanup instructions to shutdown
the application.


Availability of logs in export sinks
Export to BigQuery is within minutes (we see it to be almost instant), GCS can
have a delay of 2 to 12 hours and Pub/Sub is almost immediately.



Monitoring and Access Policies for HTTP Egress Traffic
Fri, 22 Jun 2018 00:00:00 +0000
While Istio’s main focus is management of traffic between microservices inside a service mesh, Istio can also manage
ingress (from outside into the mesh) and egress (from the mesh outwards) traffic. Istio can uniformly enforce access
policies and aggregate telemetry data for mesh-internal, ingress and egress traffic.
In this blog post, we show how to apply monitoring and access policies to HTTP egress traffic with Istio.
Use case
Consider an organization that runs applications that process content from cnn.com. The applications are decomposed
into microservices deployed in an Istio service mesh. The applications access pages of various topics from cnn.com: edition.cnn.com/politics, edition.cnn.com/sport and  edition.cnn.com/health. The organization configures Istio to allow access to edition.cnn.com and everything works fine. However, at some
point in time, the organization decides to banish politics. Practically, it means blocking access to
edition.cnn.com/politics and allowing access to
edition.cnn.com/sport and  edition.cnn.com/health
only. The organization will grant permissions to individual applications and to particular users to access edition.cnn.com/politics, on a case-by-case basis.
To achieve that goal, the organization’s operations people monitor access to the external services and
analyze Istio logs to verify that no unauthorized request was sent to
edition.cnn.com/politics. They also configure Istio to prevent access to
edition.cnn.com/politics automatically.
The organization is resolved to prevent any tampering with the new policy. It decides to put mechanisms in place that
will prevent any possibility for a malicious application to access the forbidden topic.
Related tasks and examples

The Control Egress Traffic task demonstrates how external (outside the
Kubernetes cluster) HTTP and HTTPS services can be accessed by applications inside the mesh.
The Configure an Egress Gateway example describes how to configure
Istio to direct egress traffic through a dedicated gateway service called egress gateway.
The Egress Gateway with TLS Origination example
demonstrates how to allow applications to send HTTP requests to external servers that require HTTPS, while directing
traffic through egress gateway.
The Visualizing Metrics with Grafana
describes the Istio Dashboard to monitor mesh traffic.
The Basic Access Control task shows how to control access to
in-mesh services.
The Denials and White/Black Listing task shows how to configure
access policies using black or white list checkers.

As opposed to the observability and security tasks above, this blog post describes Istio’s monitoring and access policies
applied exclusively to the egress traffic.
Before you begin
Follow the steps in the Egress Gateway with TLS Origination example, with mutual TLS authentication enabled, without
the Cleanup step.
After completing that example, you can access edition.cnn.com/politics from an in-mesh container with curl installed. This blog post assumes that the SOURCE_POD environment variable contains the source pod’s name and that the container’s name is sleep.
Configure monitoring and access policies
Since you want to accomplish your tasks in a secure way, you should direct egress traffic through
egress gateway, as described in the Egress Gateway with TLS Origination
task. The secure way here means that you want to prevent malicious applications from bypassing Istio monitoring and
policy enforcement.
According to our scenario, the organization performed the instructions in the
Before you begin section, enabled HTTP traffic to edition.cnn.com, and configured that traffic
to pass through the egress gateway. The egress gateway performs TLS origination to edition.cnn.com, so the traffic
leaves the mesh encrypted. At this point, the organization is ready to configure Istio to monitor and apply access policies for
the traffic to edition.cnn.com.
Logging
Configure Istio to log access to *.cnn.com. You create a logentry and two
stdio handlers, one for logging forbidden access
(error log level) and another one for logging all access to *.cnn.com (info log level). Then you create rules to
direct your logentry instances to your handlers. One rule directs access to *.cnn.com/politics to the handler for
logging forbidden access, another rule directs log entries to the handler that outputs each access to *.cnn.com as an
info log entry. To understand the Istio logentries, rules, and handlers, see
Istio Adapter Model. A diagram with the involved entities and dependencies between them
appears below:

    
        
            
        
    
    Instances, rules and handlers for egress monitoring



Create the logentry, rules and handlers. Note that you specify context.reporter.uid as
kubernetes://istio-egressgateway in the rules to get logs from the egress gateway only.
$ cat <



Send three HTTP requests to cnn.com, to edition.cnn.com/politics, edition.cnn.com/sport and edition.cnn.com/health.
All three should return 200 OK.
$ kubectl exec -it $SOURCE_POD -c sleep -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
200
200
200


Query the Mixer log and see that the information about the requests appears in the log:
$ kubectl -n istio-system logs -l istio-mixer-type=telemetry -c mixer | grep egress-access | grep cnn | tail -4
{"level":"info","time":"2019-01-29T07:43:24.611462Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":1883355,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"info","time":"2019-01-29T07:43:24.886316Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/sport","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2094561,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"info","time":"2019-01-29T07:43:25.369663Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/health","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2157009,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"error","time":"2019-01-29T07:43:24.611462Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":1883355,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
You see four log entries related to your three requests. Three info entries about the access to edition.cnn.com
and one error entry about the access to edition.cnn.com/politics. The service mesh operators can see all the
access instances, and can also search the log for error log entries that represent forbidden accesses. This is the
first security measure the organization can apply before blocking the forbidden accesses automatically, namely
logging all the forbidden access instances as errors. In some settings this can be a sufficient security measure.
Note the attributes:

destination, path, responseCode, responseSize are related to HTTP parameters of the requests
sourcePrincipal:cluster.local/ns/default/sa/sleep - a string that represents the sleep service account in
the default namespace
reporterUID: kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system - a UID of the reporting pod, in
this case istio-egressgateway-747b6764b8-44rrh in the istio-system namespace



Access control by routing
After enabling logging of access to edition.cnn.com, automatically enforce an access policy, namely allow
accessing /health and /sport URL paths only. Such a simple policy control can be implemented with Istio routing.


Redefine your VirtualService for edition.cnn.com:
$ cat <

Note that you added a match by uri condition that checks that the URL path is
either /health or /sport. Also note that this condition is added to the istio-egressgateway
section of the VirtualService, since the egress gateway is a hardened component in terms of security (see
[egress gateway security considerations]
(/docs/tasks/traffic-management/egress/egress-gateway/#additional-security-considerations)). You don’t want any tampering
with your policies.


Send the previous three HTTP requests to cnn.com:
$ kubectl exec -it $SOURCE_POD -c sleep -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
404
200
200
The request to edition.cnn.com/politics returned 404 Not Found, while requests
to edition.cnn.com/sport and
edition.cnn.com/health returned 200 OK, as expected.

    
        
            
        
        You may need to wait several seconds for the update of the VirtualService to propagate to the egress
gateway.
    




Query the Mixer log and see that the information about the requests appears again in the log:
$ kubectl -n istio-system logs -l istio-mixer-type=telemetry -c mixer | grep egress-access | grep cnn | tail -4
{"level":"info","time":"2019-01-29T07:55:59.686082Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":404,"responseSize":0,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"info","time":"2019-01-29T07:55:59.697565Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/sport","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2094561,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"info","time":"2019-01-29T07:56:00.264498Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/health","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2157009,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
{"level":"error","time":"2019-01-29T07:55:59.686082Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":404,"responseSize":0,"sourcePrincipal":"cluster.local/ns/default/sa/sleep"}
You still get info and error messages regarding accesses to
edition.cnn.com/politics, however this time the responseCode is 404, as
expected.


While implementing access control using Istio routing worked for us in this simple case, it would not suffice for more
complex cases. For example, the organization may want to allow access to
edition.cnn.com/politics under certain conditions, so more complex policy logic than
just filtering by URL paths will be required. You may want to apply Istio Mixer Adapters,
for example
white lists or black lists
of allowed/forbidden URL paths, respectively.
Policy Rules allow specifying complex conditions,
specified in a rich expression language, which
includes AND and OR logical operators. The rules can be reused for both logging and policy checks. More advanced users
may want to apply Istio Role-Based Access Control.
An additional aspect is integration with remote access policy systems. If the organization in our use case operates some
Identity and Access Management system, you may want to configure
Istio to use access policy information from such a system. You implement this integration by applying
Istio Mixer Adapters.
Cancel the access control by routing you used in this section and implement access control by Mixer policy checks
in the next section.


Replace the VirtualService for edition.cnn.com with your previous version from the Configure an Egress Gateway example:
$ cat <



Send the previous three HTTP requests to cnn.com, this time you should get three 200 OK responses as
previously:
$ kubectl exec -it $SOURCE_POD -c sleep -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
200
200
200



    
        
            
        
        You may need to wait several seconds for the update of the VirtualService to propagate to the egress
gateway.
    


Access control by Mixer policy checks
In this step you use a Mixer
Listchecker adapter, its whitelist
variety. You define a listentry with the URL path of the request and a listchecker to check the listentry using a
static list of allowed URL paths, specified by the overrides field. For an external Identity and Access Management system, use the providerurl field instead. The updated
diagram of the instances, rules and handlers appears below. Note that you reuse the same policy rule, handle-cnn-access
both for logging and for access policy checks.

    
        
            
        
    
    Instances, rules and handlers for egress monitoring and access policies



Define path-checker and request-path:
$ cat <



Modify the handle-cnn-access policy rule to send request-path instances to the path-checker:
$ cat <



Perform your usual test by sending HTTP requests to
edition.cnn.com/politics, edition.cnn.com/sport
and edition.cnn.com/health. As expected, the request to
edition.cnn.com/politics returns 403 (Forbidden).
$ kubectl exec -it $SOURCE_POD -c sleep -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
403
200
200


Access control by Mixer policy checks, part 2
After the organization in our use case managed to configure logging and access control, it decided to extend its access
policy by allowing the applications with a special
Service Account to access any topic of cnn.com, without being monitored. You’ll see how this requirement can be configured in Istio.


Start the sleep sample with the politics service account.
Zip$  sed 's/: sleep/: politics/g' @samples/sleep/sleep.yaml@ | kubectl create -f -
serviceaccount "politics" created
service "politics" created
deployment "politics" created


Define the SOURCE_POD_POLITICS shell variable to hold the name of the source pod with the politics service
account, for sending requests to external services.
$ export SOURCE_POD_POLITICS=$(kubectl get pod -l app=politics -o jsonpath={.items..metadata.name})


Perform your usual test of sending three HTTP requests this time from SOURCE_POD_POLITICS.
The request to edition.cnn.com/politics returns 403, since you did not configure
the exception for the politics namespace.
$ kubectl exec -it $SOURCE_POD_POLITICS -c politics -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
403
200
200


Query the Mixer log and see that the information about the requests from the politics namespace appears in
the log:
$ kubectl -n istio-system logs -l istio-mixer-type=telemetry -c mixer | grep egress-access | grep cnn | tail -4
{"level":"info","time":"2019-01-29T08:04:42.559812Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":403,"responseSize":84,"sourcePrincipal":"cluster.local/ns/default/sa/politics"}
{"level":"info","time":"2019-01-29T08:04:42.568424Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/sport","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2094561,"sourcePrincipal":"cluster.local/ns/default/sa/politics"}
{"level":"error","time":"2019-01-29T08:04:42.559812Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/politics","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":403,"responseSize":84,"sourcePrincipal":"cluster.local/ns/default/sa/politics"}
{"level":"info","time":"2019-01-29T08:04:42.615641Z","instance":"egress-access.logentry.istio-system","destination":"edition.cnn.com","path":"/health","reporterUID":"kubernetes://istio-egressgateway-747b6764b8-44rrh.istio-system","responseCode":200,"responseSize":2157009,"sourcePrincipal":"cluster.local/ns/default/sa/politics"}
Note that sourcePrincipal is cluster.local/ns/default/sa/politics which represents the politics service
account in the default namespace.


Redefine handle-cnn-access and handle-politics policy rules, to make the applications in the politics
namespace exempt from monitoring and policy enforcement.
$ cat <



Perform your usual test from SOURCE_POD:
$ kubectl exec -it $SOURCE_POD -c sleep -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
403
200
200
Since SOURCE_POD does not have politics service account, access to
edition.cnn.com/politics is forbidden, as previously.


Perform the previous test from SOURCE_POD_POLITICS:
$ kubectl exec -it $SOURCE_POD_POLITICS -c politics -- sh -c 'curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/politics; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/sport; curl -sL -o /dev/null -w "%{http_code}\n" http://edition.cnn.com/health'
200
200
200
Access to all the topics of edition.cnn.com is allowed.


Examine the Mixer log and see that no more requests with sourcePrincipal equal
cluster.local/ns/default/sa/politics appear in the log.
$  kubectl -n istio-system logs -l istio-mixer-type=telemetry -c mixer | grep egress-access | grep cnn | tail -4


Comparison with HTTPS egress traffic control
In this use case the applications use HTTP and Istio Egress Gateway performs TLS origination for them. Alternatively,
the applications could originate TLS themselves by issuing HTTPS requests to edition.cnn.com. In this section we
describe both approaches and their pros and cons.
In the HTTP approach, the requests are sent unencrypted on the local host, intercepted by the Istio sidecar proxy and
forwarded to the egress gateway. Since you configure Istio to use mutual TLS between the sidecar proxy and the egress
gateway, the traffic leaves the pod encrypted. The egress gateway decrypts the traffic, inspects the URL path, the
HTTP method and headers, reports telemetry and performs policy checks. If the request is not blocked by some policy
check, the egress gateway performs TLS origination to the external destination (cnn.com in our case), so the request
is encrypted again and sent encrypted to the external destination. The diagram below demonstrates the network flow of
this approach. The HTTP protocol inside the gateway designates the protocol as seen by the gateway after decryption.

    
        
            
        
    
    HTTP egress traffic through an egress gateway

The drawback of this approach is that the requests are sent unencrypted inside the pod, which may be against security
policies in some organizations. Also some SDKs have external service URLs hard-coded, including the protocol, so
sending HTTP requests could be impossible. The advantage of this approach is the ability to inspect HTTP methods,
headers and URL paths, and to apply policies based on them.
In the HTTPS approach, the requests are encrypted end-to-end, from the application to the external destination. The
diagram below demonstrates the network flow of this approach. The HTTPS protocol inside the gateway designates the
protocol as seen by the gateway.

    
        
            
        
    
    HTTPS egress traffic through an egress gateway

The end-to-end HTTPS is considered a better approach from the security point of view. However, since the traffic is
encrypted the Istio proxies and the egress gateway can only see the source and destination IPs and the SNI of the destination. Since you configure Istio to use mutual TLS between the sidecar proxy
and the egress gateway, the identity of the source is also known.
The gateway is unable to inspect the URL path, the HTTP method and the headers of the requests, so no monitoring and
policies based on the HTTP information can be possible.
In our use case, the organization would be able to allow access to edition.cnn.com and to specify which applications
are allowed to access edition.cnn.com.
However, it will not be possible to allow or block access to specific URL paths of edition.cnn.com.
Neither blocking access to edition.cnn.com/politics nor monitoring such access are
possible with the HTTPS approach.
We guess that each organization will consider the pros and cons of the two approaches and choose the one most
appropriate to its needs.
Summary
In this blog post we showed how different monitoring and policy mechanisms of Istio can be applied to HTTP egress
traffic. Monitoring can be implemented by configuring a logging adapter. Access
policies can be implemented by configuring VirtualServices or by configuring various policy check adapters. We
demonstrated a simple policy that allowed certain URL paths only. We also showed a more complex policy that extended the
simple policy by making an exemption to the applications with a certain service account. Finally, we compared
HTTP-with-TLS-origination egress traffic with HTTPS egress traffic, in terms of control possibilities by Istio.
Cleanup


Perform the instructions in Cleanup section of the
Configure an Egress Gateway example.


Delete the logging and policy checks configuration:
$ kubectl delete logentry egress-access -n istio-system
$ kubectl delete stdio egress-error-logger -n istio-system
$ kubectl delete stdio egress-access-logger -n istio-system
$ kubectl delete rule handle-politics -n istio-system
$ kubectl delete rule handle-cnn-access -n istio-system
$ kubectl delete -n istio-system listchecker path-checker
$ kubectl delete -n istio-system listentry request-path


Delete the politics source pod:
Zip$ sed 's/: sleep/: politics/g' @samples/sleep/sleep.yaml@ | kubectl delete -f -
serviceaccount "politics" deleted
service "politics" deleted
deployment "politics" deleted





Introducing the Istio v1alpha3 routing API
Wed, 25 Apr 2018 00:00:00 +0000
Up until now, Istio has provided a simple API for traffic management using four configuration resources:
RouteRule, DestinationPolicy, EgressRule, and (Kubernetes) Ingress.
With this API, users have been able to easily manage the flow of traffic in an Istio service mesh.
The API has allowed users to route requests to specific versions of services, inject delays and failures for resilience
testing, add timeouts and circuit breakers, and more, all without changing the application code itself.
While this functionality has proven to be a very compelling part of Istio, user feedback has also shown that this API does
have some shortcomings, specifically when using it to manage very large applications containing thousands of services, and
when working with protocols other than HTTP. Furthermore, the use of Kubernetes Ingress resources to configure external
traffic has proven to be woefully insufficient for our needs.
To address these, and other concerns, a new traffic management API, a.k.a. v1alpha3, is being introduced, which will
completely replace the previous API going forward. Although the v1alpha3 model is fundamentally the same, it is not
backward compatible and will require manual conversion from the old API.
To justify this disruption, the v1alpha3 API has gone through a long and painstaking community
review process that has hopefully resulted in a greatly improved API that will stand the test of time. In this article,
we will introduce the new configuration model and attempt to explain some of the motivation and design principles that
influenced it.
Design principles
A few key design principles played a role in the routing model redesign:

Explicitly model infrastructure as well as intent. For example, in addition to configuring an ingress gateway, the
component (controller) implementing it can also be specified.
The authoring model should be “producer oriented” and “host centric” as opposed to compositional. For example, all
rules associated with a particular host are configured together, instead of individually.
Clear separation of routing from post-routing behaviors.

Configuration resources in v1alpha3
A typical mesh will have one or more load balancers (we call them gateways)
that terminate TLS from external networks and allow traffic into the mesh.
Traffic then flows through internal services via sidecar gateways.
It is also common for applications to consume external
services (e.g., Google Maps API). These may be called directly or, in certain deployments, all traffic
exiting the mesh may be forced through dedicated egress gateways. The following diagram depicts
this mental model.

    
        
            
        
    
    Gateways in an Istio service mesh

With the above setup in mind, v1alpha3 introduces the following new
configuration resources to control traffic routing into, within, and out of the mesh.

Gateway
VirtualService
DestinationRule
ServiceEntry

VirtualService, DestinationRule, and ServiceEntry replace RouteRule,
DestinationPolicy, and EgressRule respectively. The Gateway is a
platform independent abstraction to model the traffic flowing into
dedicated middleboxes.
The figure below depicts the flow of control across configuration
resources.

    
        
            
        
    
    Relationship between different v1alpha3 elements

Gateway
A Gateway
configures a load balancer for HTTP/TCP traffic, regardless of
where it will be running.  Any number of gateways can exist within the mesh
and multiple different gateway implementations can co-exist. In fact, a
gateway configuration can be bound to a particular workload by specifying
the set of workload (pod) labels as part of the configuration, allowing
users to reuse off the shelf network appliances by writing a simple gateway
controller.
For ingress traffic management, you might ask: Why not reuse Kubernetes Ingress APIs?
The Ingress APIs proved to be incapable of expressing Istio’s routing needs.
By trying to draw a common denominator across different HTTP proxies, the
Ingress is only able to support the most basic HTTP routing and ends up
pushing every other feature of modern proxies into non-portable
annotations.
Istio Gateway overcomes the Ingress shortcomings by separating the
L4-L6 spec from L7. It only configures the L4-L6 functions (e.g., ports to
expose, TLS configuration) that are uniformly implemented by all good L7
proxies. Users can then use standard Istio rules to control HTTP
requests as well as TCP traffic entering a Gateway by binding a
VirtualService to it.
For example, the following simple Gateway configures a load balancer
to allow external https traffic for host bookinfo.com into the mesh:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway
spec:
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - bookinfo.com
    tls:
      mode: SIMPLE
      serverCertificate: /tmp/tls.crt
      privateKey: /tmp/tls.key
To configure the corresponding routes, a VirtualService (described in the following section)
must be defined for the same host and bound to the Gateway using
the gateways field in the configuration:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: bookinfo
spec:
  hosts:
    - bookinfo.com
  gateways:
  - bookinfo-gateway # <---- bind to gateway
  http:
  - match:
    - uri:
        prefix: /reviews
    route:
    ...
The Gateway can be used to model an edge-proxy or a purely internal proxy
as shown in the first figure. Irrespective of the location, all gateways
can be configured and controlled in the same way.
VirtualService
Replacing route rules with something called “virtual services” might seem peculiar at first, but in reality it’s
fundamentally a much better name for what is being configured, especially after redesigning the API to address the
scalability issues with the previous model.
In effect, what has changed is that instead of configuring routing using a set of individual configuration resources
(rules) for a particular destination service, each containing a precedence field to control the order of evaluation, we
now configure the (virtual) destination itself, with all of its rules in an ordered list within a corresponding
VirtualService resource.
For example, where previously we had two RouteRule resources for the
Bookinfo application’s reviews service, like this:
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-default
spec:
  destination:
    name: reviews
  precedence: 1
  route:
  - labels:
      version: v1
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-test-v2
spec:
  destination:
    name: reviews
  precedence: 2
  match:
    request:
      headers:
        cookie:
          regex: "^(.*?;)?(user=jason)(;.*)?$"
  route:
  - labels:
      version: v2
In v1alpha3, we provide the same configuration in a single VirtualService resource:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - match:
    - headers:
        cookie:
          regex: "^(.*?;)?(user=jason)(;.*)?$"
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
As you can see, both of the rules for the reviews service are consolidated in one place, which at first may or may not
seem preferable. However, if you look closer at this new model, you’ll see there are fundamental differences that make
v1alpha3 vastly more functional.
First of all, notice that the destination service for the VirtualService is specified using a hosts field (repeated field, in fact) and is then again specified in a destination field of each of the route specifications. This is a
very important difference from the previous model.
A VirtualService describes the mapping between one or more user-addressable destinations to the actual destination workloads inside the mesh. In our example, they are the same, however, the user-addressed hosts can be any DNS
names with optional wildcard prefix or CIDR prefix that will be used to address the service. This can be particularly
useful in facilitating turning monoliths into a composite service built out of distinct microservices without requiring the
consumers of the service to adapt to the transition.
For example, the following rule allows users to address both the reviews and ratings services of the Bookinfo application
as if they are parts of a bigger (virtual) service at http://bookinfo.com/:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: bookinfo
spec:
  hosts:
    - bookinfo.com
  http:
  - match:
    - uri:
        prefix: /reviews
    route:
    - destination:
        host: reviews
  - match:
    - uri:
        prefix: /ratings
    route:
    - destination:
        host: ratings
  ...
The hosts of a VirtualService do not actually have to be part of the service registry, they are simply virtual
destinations. This allows users to model traffic for virtual hosts that do not have routable entries inside the mesh.
These hosts can be exposed outside the mesh by binding the VirtualService to a Gateway configuration for the same host
(as described in the previous section).
In addition to this fundamental restructuring, VirtualService includes several other important changes:


Multiple match conditions can be expressed inside the VirtualService configuration, reducing the need for redundant
rules.


Each service version has a name (called a service subset). The set of pods/VMs belonging to a subset is defined in a
DestinationRule, described in the following section.


VirtualService hosts can be specified using wildcard DNS prefixes to create a single rule for all matching services.
For example, in Kubernetes, to apply the same rewrite rule for all services in the foo namespace, the VirtualService
would use *.foo.svc.cluster.local as the host.


DestinationRule
A DestinationRule
configures the set of policies to be applied while forwarding traffic to a service. They are
intended to be authored by service owners, describing the circuit breakers, load balancer settings, TLS settings, etc..
DestinationRule is more or less the same as its predecessor, DestinationPolicy, with the following exceptions:

The host of a DestinationRule can include wildcard prefixes, allowing a single rule to be specified for many actual
services.
A DestinationRule defines addressable subsets (i.e., named versions) of the corresponding destination host. These
subsets are used in VirtualService route specifications when sending traffic to specific versions of the service.
Naming versions this way allows us to cleanly refer to them across different virtual services, simplify the stats that
Istio proxies emit, and to encode subsets in SNI headers.

A DestinationRule that configures policies and subsets for the reviews service might look something like this:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
  - name: v3
    labels:
      version: v3
Notice that, unlike DestinationPolicy, multiple policies (e.g., default and v2-specific) are specified in a single
DestinationRule configuration.
ServiceEntry
ServiceEntry
is used to add additional entries into the service registry that Istio maintains internally.
It is most commonly used to allow one to model traffic to external dependencies of the mesh
such as APIs consumed from the web or traffic to services in legacy infrastructure.
Everything you could previously configure using an EgressRule can just as easily be done with a ServiceEntry.
For example, access to a simple external service from inside the mesh can be enabled using a configuration
something like this:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: foo-ext
spec:
  hosts:
  - foo.com
  ports:
  - number: 80
    name: http
    protocol: HTTP
That said, ServiceEntry has significantly more functionality than its predecessor.
First of all, a ServiceEntry is not limited to external service configuration,
it can be of two types: mesh-internal or mesh-external.
Mesh-internal entries are like all other internal services but are used to explicitly add services
to the mesh. They can be used to add services as part of expanding the service mesh to include unmanaged infrastructure
(e.g., VMs added to a Kubernetes-based service mesh).
Mesh-external entries represent services external to the mesh.
For them, mutual TLS authentication is disabled and policy enforcement is performed on the client-side,
instead of on the usual server-side for internal service requests.
Because a ServiceEntry configuration simply adds a destination to the internal service registry, it can be
used in conjunction with a VirtualService and/or DestinationRule, just like any other service in the registry.
The following DestinationRule, for example, can be used to initiate mutual TLS connections for an external service:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: foo-ext
spec:
  host: foo.com
  trafficPolicy:
    tls:
      mode: MUTUAL
      clientCertificate: /etc/certs/myclientcert.pem
      privateKey: /etc/certs/client_private_key.pem
      caCertificates: /etc/certs/rootcacerts.pem
In addition to its expanded generality, ServiceEntry provides several other improvements over EgressRule
including the following:

A single ServiceEntry can configure multiple service endpoints, which previously would have required multiple
EgressRules.
The resolution mode for the endpoints is now configurable (NONE, STATIC, or DNS).
Additionally, we are working on addressing another pain point: the need to access secure external services over plain
text ports (e.g., http://google.com:443). This should be fixed in the coming weeks, allowing you to directly access
https://google.com from your application. Stay tuned for an Istio patch release (0.8.x) that addresses this limitation.

Creating and deleting v1alpha3 route rules
Because all route rules for a given destination are now stored together as an ordered
list in a single VirtualService resource, adding a second and subsequent rules for a particular destination
is no longer done by creating a new (RouteRule) resource, but instead by updating the one-and-only VirtualService
resource for the destination.
old routing rules:
$ kubectl apply -f my-second-rule-for-destination-abc.yaml
v1alpha3 routing rules:
$ kubectl apply -f my-updated-rules-for-destination-abc.yaml
Deleting route rules other than the last one for a particular destination is also done by updating
the existing resource using kubectl apply.
When adding or removing routes that refer to service versions, the subsets will need to be updated in
the service’s corresponding DestinationRule.
As you might have guessed, this is also done using kubectl apply.
Summary
The Istio v1alpha3 routing API has significantly more functionality than
its predecessor, but unfortunately is not backwards compatible, requiring a
one time manual conversion.  The previous configuration resources,
RouteRule, DesintationPolicy, and EgressRule, will not be supported
from Istio 0.9 onwards. Kubernetes users can continue to use Ingress to
configure their edge load balancers for basic routing. However, advanced
routing features (e.g., traffic split across two versions) will require use
of Gateway, a significantly more functional and highly
recommended Ingress replacement.
Acknowledgments
Credit for the routing model redesign and implementation work goes to the
following people (in alphabetical order):

Frank Budinsky (IBM)
Zack Butcher (Google)
Greg Hanson (IBM)
Costin Manolache (Google)
Martin Ostrowski (Google)
Shriram Rajagopalan (VMware)
Louis Ryan (Google)
Isaiah Snell-Feikema (IBM)
Kuat Yessenov (Google)




Configuring Istio Ingress with AWS NLB
Fri, 20 Apr 2018 00:00:00 +0000

    
        
            
        
        This post was updated on January 16, 2019 to include some usage warnings.
    


This post provides instructions to use and configure ingress Istio with AWS Network Load Balancer.
Network load balancer (NLB) could be used instead of classical load balancer. You can see the comparison between different AWS loadbalancer for more explanation.
Prerequisites
The following instructions require a Kubernetes 1.9.0 or newer cluster.

    
        
            
        
        Usage of AWS nlb on Kubernetes is an Alpha feature and not recommended for production clusters.
Usage of AWS nlb does not support the creation of two or more Kubernetes clusters running Istio in the same zone as a result of Kubernetes Bug #69264.

    


IAM policy
You need to apply policy on the master role in order to be able to provision network load balancer.


In AWS iam console click on policies and click on create a new one:

        
            
                
            
        
        Create a new policy
    


Select json:

        
            
                
            
        
        Select json
    


Copy/paste text below:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "kopsK8sNLBMasterPermsRestrictive",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVpcs",
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancerPolicies",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:SetLoadBalancerPoliciesOfListener"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVpcs",
                "ec2:DescribeRegions"
            ],
            "Resource": "*"
        }
    ]
}


Click review policy, fill all fields and click create policy:

        
            
                
            
        
        Validate policy
    


Click on roles, select you master role nodes, and click attach policy:

        
            
                
            
        
        Attach policy
    


Your policy is now attach to your master node.


Generate the Istio manifest
To use an AWS nlb load balancer, it is necessary to add an AWS specific
annotation to the Istio installation.  These instructions explain how to
add the annotation.
Save this as the file override.yaml:
gateways:
  istio-ingressgateway:
    serviceAnnotations:
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
Generate a manifest with Helm:
$ helm template install/kubernetes/helm/istio --namespace istio -f override.yaml > $HOME/istio.yaml



Istio Soft Multi-Tenancy Support
Thu, 19 Apr 2018 00:00:00 +0000
Multi-tenancy is commonly used in many environments across many different applications,
but the implementation details and functionality provided on a per tenant basis does not
follow one model in all environments.  The Kubernetes multi-tenancy working group
is working to define the multi-tenant use cases and functionality that should be available
within Kubernetes. However, from their work so far it is clear that only “soft multi-tenancy”
is possible due to the inability to fully protect against malicious containers or workloads
gaining access to other tenant’s pods or kernel resources.
Soft multi-tenancy
For this blog, “soft multi-tenancy” is defined as having a single Kubernetes control plane
with multiple Istio control planes and multiple meshes, one control plane and one mesh
per tenant. The cluster administrator gets control and visibility across all the Istio
control planes, while the tenant administrator only gets control of a specific Istio
instance. Separation between the tenants is provided by Kubernetes namespaces and RBAC.
One use case for this deployment model is a shared corporate infrastructure where malicious
actions are not expected, but a clean separation of the tenants is still required.
Potential future Istio multi-tenant deployment models are described at the bottom of this
blog.

    
        
            
        
        This blog is a high-level description of how to deploy Istio in a
limited multi-tenancy environment. The docs section will be updated
when official multi-tenancy support is provided.
    


Deployment
Multiple Istio control planes
Deploying multiple Istio control planes starts by replacing all namespace references
in a manifest file with the desired namespace. Using istio.yaml as an example, if two tenant
level Istio control planes are required; the first can use the istio.yaml default name of
istio-system and a second control plane can be created by generating a new yaml file with
a different namespace. As an example, the following command creates a yaml file with
the Istio namespace of istio-system1.
$ cat istio.yaml | sed s/istio-system/istio-system1/g > istio-system1.yaml
The istio.yaml file contains the details of the Istio control plane deployment, including the
pods that make up the control plane (Mixer, Pilot, Ingress, Galley, CA). Deploying the two Istio
control plane yaml files:
$ kubectl apply -f install/kubernetes/istio.yaml
$ kubectl apply -f install/kubernetes/istio-system1.yaml
Results in two Istio control planes running in two namespaces.
$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                       READY     STATUS    RESTARTS   AGE
istio-system    istio-ca-ffbb75c6f-98w6x                   1/1       Running   0          15d
istio-system    istio-ingress-68d65fc5c6-dnvfl             1/1       Running   0          15d
istio-system    istio-mixer-5b9f8dffb5-8875r               3/3       Running   0          15d
istio-system    istio-pilot-678fc976c8-b8tv6               2/2       Running   0          15d
istio-system1   istio-ca-5f496fdbcd-lqhlk                  1/1       Running   0          15d
istio-system1   istio-ingress-68d65fc5c6-2vldg             1/1       Running   0          15d
istio-system1   istio-mixer-7d4f7b9968-66z44               3/3       Running   0          15d
istio-system1   istio-pilot-5bb6b7669c-779vb               2/2       Running   0          15d
The Istio sidecar
and addons, if required, manifests must also be
deployed to match the configured namespace in use by the tenant’s Istio
control plane.
The execution of these two yaml files is the responsibility of the cluster
administrator, not the tenant level administrator. Additional RBAC restrictions will also
need to be configured and applied by the cluster administrator, limiting the tenant
administrator to only the assigned namespace.
Split common and namespace specific resources
The manifest files in the Istio repositories create both common resources that would
be used by all Istio control planes as well as resources that are replicated per control
plane. Although it is a simple matter to deploy multiple control planes by replacing the
istio-system namespace references as described above, a better approach is to split the
manifests into a common part that is deployed once for all tenants and a tenant
specific part. For the Custom Resource Definitions, the roles and the role
bindings should be separated out from the provided Istio manifests.  Additionally, the
roles and role bindings in the provided Istio manifests are probably unsuitable for a
multi-tenant environment and should be modified or augmented as described in the next
section.
Kubernetes RBAC for Istio control plane resources
To restrict a tenant administrator to a single Istio namespace, the cluster
administrator would create a manifest containing, at a minimum, a Role and RoleBinding
similar to the one below. In this example, a tenant administrator named sales-admin
is limited to the namespace istio-system1. A completed manifest would contain many
more apiGroups under the Role providing resource access to the tenant administrator.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: istio-system1
  name: ns-access-for-sales-admin-istio-system1
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["*"]
  verbs: ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: access-all-istio-system1
  namespace: istio-system1
subjects:
- kind: User
  name: sales-admin
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: ns-access-for-sales-admin-istio-system1
  apiGroup: rbac.authorization.k8s.io
Watching specific namespaces for service discovery
In addition to creating RBAC rules limiting the tenant administrator’s access to a specific
Istio control plane, the Istio manifest must be updated to specify the application namespace
that Pilot should watch for creation of its xDS cache. This is done by starting the Pilot
component with the additional command line arguments --appNamespace, ns-1.  Where ns-1
is the namespace that the tenant’s application will be deployed in. An example snippet from
the istio-system1.yaml file is shown below.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: istio-pilot
  namespace: istio-system1
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        istio: pilot
    spec:
      serviceAccountName: istio-pilot-service-account
      containers:
      - name: discovery
        image: docker.io//pilot:
        imagePullPolicy: IfNotPresent
        args: ["discovery", "-v", "2", "--admission-service", "istio-pilot", "--appNamespace", "ns-1"]
        ports:
        - containerPort: 8080
        - containerPort: 443
Deploying the tenant application in a namespace
Now that the cluster administrator has created the tenant’s namespace (ex. istio-system1) and
Pilot’s service discovery has been configured to watch for a specific application
namespace (ex. ns-1), create the application manifests to deploy in that tenant’s specific
namespace. For example:
apiVersion: v1
kind: Namespace
metadata:
  name: ns-1
And add the namespace reference to each resource type included in the application’s manifest
file.  For example:
apiVersion: v1
kind: Service
metadata:
  name: details
  labels:
    app: details
  namespace: ns-1
Although not shown, the application namespaces will also have RBAC settings limiting access
to certain resources. These RBAC settings could be set by the cluster administrator and/or
the tenant administrator.
Using kubectl in a multi-tenant environment
When defining route rules
or destination policies,
it is necessary to ensure that the kubectl command is scoped to
the namespace the Istio control plane is running in to ensure the resource is created
in the proper namespace. Additionally, the rule itself must be scoped to the tenant’s namespace
so that it will be applied properly to that tenant’s mesh.  The -i option is used to create
(or get or describe) the rule in the namespace that the Istio control plane is deployed in.
The -n option will scope the rule to the tenant’s mesh and should be set to the namespace that
the tenant’s app is deployed in. Note that the -n option can be skipped on the command line if
the .yaml file for the resource scopes it properly instead.
For example, the following command would be required to add a route rule to the istio-system1
namespace:
$ kubectl –i istio-system1 apply -n ns-1 -f route_rule_v2.yaml
And can be displayed using the command:
$ kubectl -i istio-system1 -n ns-1 get routerule
NAME                  KIND                                  NAMESPACE
details-Default       RouteRule.v1alpha2.config.istio.io    ns-1
productpage-default   RouteRule.v1alpha2.config.istio.io    ns-1
ratings-default       RouteRule.v1alpha2.config.istio.io    ns-1
reviews-default       RouteRule.v1alpha2.config.istio.io    ns-1
See the Multiple Istio control planes section of this document for more details on namespace requirements in a
multi-tenant environment.
Test results
Following the instructions above, a cluster administrator can create an environment limiting,
via RBAC and namespaces, what a tenant administrator can deploy.
After deployment, accessing the Istio control plane pods assigned to a specific tenant
administrator is permitted:
$ kubectl get pods -n istio-system
NAME                                      READY     STATUS    RESTARTS   AGE
grafana-78d649479f-8pqk9                  1/1       Running   0          1d
istio-ca-ffbb75c6f-98w6x                  1/1       Running   0          1d
istio-ingress-68d65fc5c6-dnvfl            1/1       Running   0          1d
istio-mixer-5b9f8dffb5-8875r              3/3       Running   0          1d
istio-pilot-678fc976c8-b8tv6              2/2       Running   0          1d
istio-sidecar-injector-7587bd559d-5tgk6   1/1       Running   0          1d
prometheus-cf8456855-hdcq7                1/1       Running   0          1d
However, accessing all the cluster’s pods is not permitted:
$ kubectl get pods --all-namespaces
Error from server (Forbidden): pods is forbidden: User "dev-admin" cannot list pods at the cluster scope
And neither is accessing another tenant’s namespace:
$ kubectl get pods -n istio-system1
Error from server (Forbidden): pods is forbidden: User "dev-admin" cannot list pods in the namespace "istio-system1"
The tenant administrator can deploy applications in the application namespace configured for
that tenant. As an example, updating the Bookinfo
manifests and then deploying under the tenant’s application namespace of ns-0, listing the
pods in use by this tenant’s namespace is permitted:
$ kubectl get pods -n ns-0
NAME                              READY     STATUS    RESTARTS   AGE
details-v1-64b86cd49-b7rkr        2/2       Running   0          1d
productpage-v1-84f77f8747-rf2mt   2/2       Running   0          1d
ratings-v1-5f46655b57-5b4c5       2/2       Running   0          1d
reviews-v1-ff6bdb95b-pm5lb        2/2       Running   0          1d
reviews-v2-5799558d68-b989t       2/2       Running   0          1d
reviews-v3-58ff7d665b-lw5j9       2/2       Running   0          1d
But accessing another tenant’s application namespace is not:
$ kubectl get pods -n ns-1
Error from server (Forbidden): pods is forbidden: User "dev-admin" cannot list pods in the namespace "ns-1"
If the add-on tools, example
Prometheus, are deployed
(also limited by an Istio namespace) the statistical results returned would represent only
that traffic seen from that tenant’s application namespace.
Conclusion
The evaluation performed indicates Istio has sufficient capabilities and security to meet a
small number of multi-tenant use cases. It also shows that Istio and Kubernetes cannot
provide sufficient capabilities and security for other use cases, especially those use
cases that require complete security and isolation between untrusted tenants. The improvements
required to reach a more secure model of security and isolation require work in container
technology, ex. Kubernetes, rather than improvements in Istio capabilities.
Issues

The CA (Certificate Authority) and Mixer pod logs from one tenant’s Istio control
plane (e.g. istio-system namespace) contained ‘info’ messages from a second tenant’s
Istio control plane (e.g. istio-system1 namespace).

Challenges with other multi-tenancy models
Other multi-tenancy deployment models were considered:


A single mesh with multiple applications, one for each tenant on the mesh. The cluster
administrator gets control and visibility mesh wide and across all applications, while the
tenant administrator only gets control of a specific application.


A single Istio control plane with multiple meshes, one mesh per tenant. The cluster
administrator gets control and visibility across the entire Istio control plane and all
meshes, while the tenant administrator only gets control of a specific mesh.


A single cloud environment (cluster controlled), but multiple Kubernetes control planes
(tenant controlled).


These options either can’t be properly supported without code changes or don’t fully
address the use cases.
Current Istio capabilities are poorly suited to support the first model as it lacks
sufficient RBAC capabilities to support cluster versus tenant operations. Additionally,
having multiple tenants under one mesh is too insecure with the current mesh model and the
way Istio drives configuration to the Envoy proxies.
Regarding the second option, the current Istio paradigm assumes a single mesh per Istio control
plane. The needed changes to support this model are substantial. They would require
finer grained scoping of resources and security domains based on namespaces, as well as,
additional Istio RBAC changes. This model will likely be addressed by future work, but not
currently possible.
The third model doesn’t satisfy most use cases, as most cluster administrators prefer
a common Kubernetes control plane which they provide as a
PaaS to their tenants.
Future work
Allowing a single Istio control plane to control multiple meshes would be an obvious next
feature. An additional improvement is to provide a single mesh that can host different
tenants with some level of isolation and security between the tenants.  This could be done
by partitioning within a single control plane using the same logical notion of namespace as
Kubernetes. A document
has been started within the Istio community to define additional use cases and the
Istio functionality required to support those use cases.
References

Video on Kubernetes multi-tenancy support, Multi-Tenancy Support & Security Modeling with RBAC and Namespaces, and the supporting slide deck.
KubeCon talk on security that discusses Kubernetes support for “Cooperative soft multi-tenancy”, Building for Trust: How to Secure Your Kubernetes.
Kubernetes documentation on RBAC and namespaces.
KubeCon slide deck on Multi-tenancy Deep Dive.
Google document on Multi-tenancy models for Kubernetes. (Requires permission)
Cloud Foundry WIP document, Multi-cloud and Multi-tenancy
Istio Auto Multi-Tenancy 101




Traffic Mirroring with Istio for Testing in Production
Thu, 08 Feb 2018 00:00:00 +0000
Trying to enumerate all the possible combinations of test cases for testing services in non-production/test environments can be daunting. In some cases, you’ll find that all of the effort that goes into cataloging these use cases doesn’t match up to real production use cases. Ideally, we could use live production use cases and traffic to help illuminate all of the feature areas of the service under test that we might miss in more contrived testing environments.
Istio can help here. With the release of Istio 0.5, Istio can mirror traffic to help test your services. You can write route rules similar to the following to enable traffic mirroring:
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: mirror-traffic-to-httpbin-v2
spec:
  destination:
    name: httpbin
  precedence: 11
  route:
  - labels:
      version: v1
    weight: 100
  - labels:
      version: v2
    weight: 0
  mirror:
    name: httpbin
    labels:
      version: v2
A few things to note here:

When traffic gets mirrored to a different service, that happens outside the critical path of the request
Responses to any mirrored traffic is ignored; traffic is mirrored as “fire-and-forget”
You’ll need to have the 0-weighted route to hint to Istio to create the proper Envoy cluster under the covers; this should be ironed out in future releases.

Learn more about mirroring by visiting the Mirroring Task and see a more
comprehensive treatment of this scenario on my blog.



Consuming External TCP Services
Tue, 06 Feb 2018 00:00:00 +0000

    
        
            
        
        This blog post was updated on July 23, 2018 to use the new
v1alpha3 traffic management API. If you need to use the old version, follow these docs.
    


In my previous blog post, Consuming External Web Services, I described how external services
can be consumed by in-mesh Istio applications via HTTPS. In this post, I demonstrate consuming external services
over TCP. You will use the Istio Bookinfo sample application, the version in which the book
ratings data is persisted in a MySQL database. You deploy this database outside the cluster and configure the
ratings microservice to use it. You define a
Service Entry to allow the in-mesh applications to
access the external database.
Bookinfo sample application with external ratings database
First, you set up a MySQL database instance to hold book ratings data outside of your Kubernetes cluster. Then you
modify the Bookinfo sample application to use your database.
Setting up the database for ratings data
For this task you set up an instance of MySQL. You can use any MySQL instance; I used
Compose for MySQL. I used mysqlsh
(MySQL Shell) as a MySQL client to feed the ratings data.


Set the MYSQL_DB_HOST and MYSQL_DB_PORT environment variables:
$ export MYSQL_DB_HOST=
$ export MYSQL_DB_PORT=
In case of a local MySQL database with the default port, the values are localhost and 3306, respectively.


To initialize the database, run the following command entering the password when prompted. The command is
performed with the credentials of the  admin user, created by default by
Compose for MySQL.
$ curl -s https://raw.githubusercontent.com/istio/istio/release-1.29/samples/bookinfo/src/mysql/mysqldb-init.sql | mysqlsh --sql --ssl-mode=REQUIRED -u admin -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT
OR
When using the mysql client and a local MySQL database, run:
$ curl -s https://raw.githubusercontent.com/istio/istio/release-1.29/samples/bookinfo/src/mysql/mysqldb-init.sql | mysql -u root -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT


Create a user with the name bookinfo and grant it SELECT privilege on the test.ratings table:
$ mysqlsh --sql --ssl-mode=REQUIRED -u admin -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "CREATE USER 'bookinfo' IDENTIFIED BY ''; GRANT SELECT ON test.ratings to 'bookinfo';"
OR
For mysql and the local database, the command is:
$ mysql -u root -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "CREATE USER 'bookinfo' IDENTIFIED BY ''; GRANT SELECT ON test.ratings to 'bookinfo';"
Here you apply the principle of least privilege. This
means that you do not use your admin user in the Bookinfo application. Instead, you create a special user for the
Bookinfo application , bookinfo, with minimal privileges. In this case, the bookinfo user only has the SELECT
privilege on a single table.
After running the command to create the user, you may want to clean your bash history by checking the number of the last
command and running history -d . You don’t want the password of the
new user to be stored in the bash history. If you’re using mysql, remove the last command from
~/.mysql_history file as well. Read more about password protection of the newly created user in MySQL documentation.


Inspect the created ratings to see that everything worked as expected:
$ mysqlsh --sql --ssl-mode=REQUIRED -u bookinfo -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "select * from test.ratings;"
Enter password:
+----------+--------+
| ReviewID | Rating |
+----------+--------+
|        1 |      5 |
|        2 |      4 |
+----------+--------+
OR
For mysql and the local database:
$ mysql -u bookinfo -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "select * from test.ratings;"
Enter password:
+----------+--------+
| ReviewID | Rating |
+----------+--------+
|        1 |      5 |
|        2 |      4 |
+----------+--------+


Set the ratings temporarily to 1 to provide a visual clue when our database is used by the Bookinfo ratings
service:
$ mysqlsh --sql --ssl-mode=REQUIRED -u admin -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "update test.ratings set rating=1; select * from test.ratings;"
Enter password:

Rows matched: 2  Changed: 2  Warnings: 0
+----------+--------+
| ReviewID | Rating |
+----------+--------+
|        1 |      1 |
|        2 |      1 |
+----------+--------+
OR
For mysql and the local database:
$ mysql -u root -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "update test.ratings set rating=1; select * from test.ratings;"
Enter password:
+----------+--------+
| ReviewID | Rating |
+----------+--------+
|        1 |      1 |
|        2 |      1 |
+----------+--------+
You used the admin user (and root for the local database) in the last command since the bookinfo user does not
have the UPDATE privilege on the test.ratings table.


Now you are ready to deploy a version of the Bookinfo application that will use your database.
Initial setting of Bookinfo application
To demonstrate the scenario of using an external database, you start with a Kubernetes cluster with Istio installed. Then you deploy the
Istio Bookinfo sample application, apply the default destination rules, and change Istio to the blocking-egress-by-default policy.
This application uses the ratings microservice to fetch
book ratings, a number between 1 and 5. The ratings are displayed as stars for each review. There are several versions
of the ratings microservice. Some use MongoDB, others use MySQL
as their database.
The example commands in this blog post work with Istio 0.8+, with or without
mutual TLS enabled.
As a reminder, here is the end-to-end architecture of the application from the
Bookinfo sample application.

    
        
            
        
    
    The original Bookinfo application

Use the database for ratings data in Bookinfo application


Modify the deployment spec of a version of the ratings microservice that uses a MySQL database, to use your
database instance. The spec is in samples/bookinfo/platform/kube/bookinfo-ratings-v2-mysql.yaml
of an Istio release archive. Edit the following lines:
- name: MYSQL_DB_HOST
  value: mysqldb
- name: MYSQL_DB_PORT
  value: "3306"
- name: MYSQL_DB_USER
  value: root
- name: MYSQL_DB_PASSWORD
  value: password
Replace the values in the snippet above, specifying the database host, port, user, and password. Note that the
correct way to work with passwords in container’s environment variables in Kubernetes is to use secrets. For this
example task only, you may want to write the password directly in the deployment spec.  Do not do it in a real
environment! I also assume everyone realizes that "password" should not be used as a password…


Apply the modified spec to deploy the version of the ratings microservice, v2-mysql, that will use your
database.
Zip$ kubectl apply -f @samples/bookinfo/platform/kube/bookinfo-ratings-v2-mysql.yaml@
deployment "ratings-v2-mysql" created


Route all the traffic destined to the reviews service to its v3 version. You do this to ensure that the
reviews service always calls the ratings service. In addition, route all the traffic destined to the ratings
service to ratings v2-mysql that uses your database.
Specify the routing for both services above by adding two
virtual services. These virtual services are
specified in samples/bookinfo/networking/virtual-service-ratings-mysql.yaml of an Istio release archive.
Important: make sure you
applied the default destination rules before running the
following command.
Zip$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-mysql.yaml@


The updated architecture appears below. Note that the blue arrows inside the mesh mark the traffic configured according
to the virtual services we added. According to the virtual services, the traffic is sent to reviews v3 and
ratings v2-mysql.

    
        
            
        
    
    The Bookinfo application with ratings v2-mysql and an external MySQL database

Note that the MySQL database is outside the Istio service mesh, or more precisely outside the Kubernetes cluster. The
boundary of the service mesh is marked by a dashed line.
Access the webpage
Access the webpage of the application, after
determining the ingress IP and port.
You have a problem… Instead of the rating stars, the message “Ratings service is currently unavailable” is currently
displayed below each review:

    
        
            
        
    
    The Ratings service error messages

As in Consuming External Web Services, you experience graceful service degradation,
which is good. The application did not crash due to the error in the ratings microservice. The webpage of the
application correctly displayed the book information, the details, and the reviews, just without the rating stars.
You have the same problem as in Consuming External Web Services, namely all the traffic
outside the Kubernetes cluster, both TCP and HTTP, is blocked by default by the sidecar proxies. To enable such traffic
for TCP, a mesh-external service entry for TCP must be defined.
Mesh-external service entry for an external MySQL instance
TCP mesh-external service entries come to our rescue.


Get the IP address of your MySQL database instance. As an option, you can use the
host command:
$ export MYSQL_DB_IP=$(host $MYSQL_DB_HOST | grep " has address " | cut -d" " -f4)
For a local database, set MYSQL_DB_IP to contain the IP of your machine, accessible from your cluster.


Define a TCP mesh-external service entry:
$ kubectl apply -f - <



Review the service entry you just created and check that it contains the correct values:
$ kubectl get serviceentry mysql-external -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
...


Note that for a TCP service entry, you specify tcp as the protocol of a port of the entry. Also note that you have to
specify the IP of the external service in the list of addresses, as a CIDR block
with suffix 32.
I will talk more about TCP service entries
below. For now, verify that the service entry we added fixed the problem. Access the
webpage and see if the stars are back.
It worked! Accessing the web page of the application displays the ratings without error:

    
        
            
        
    
    Book Ratings Displayed Correctly

Note that you see a one-star rating for both displayed reviews, as expected. You changed the ratings to be one star to
provide us with a visual clue that our external database is indeed being used.
As with service entries for HTTP/HTTPS, you can delete and create service entries for TCP using kubectl, dynamically.
Motivation for egress TCP traffic control
Some in-mesh Istio applications must access external services, for example legacy systems. In many cases, the access is
not performed over HTTP or HTTPS protocols. Other TCP protocols are used, such as database-specific protocols like
MongoDB Wire Protocol and MySQL Client/Server Protocol to communicate with external databases.
Next let me provide more details about the service entries for TCP traffic.
Service entries for TCP traffic
The service entries for enabling TCP traffic to a specific port must specify TCP as the protocol of the port.
Additionally, for the MongoDB Wire Protocol, the
protocol can be specified as MONGO, instead of TCP.
For the addresses field of the entry, a block of IPs in CIDR
notation must be used. Note that the hosts field is ignored for TCP service entries.
To enable TCP traffic to an external service by its hostname, all the IPs of the hostname must be specified. Each IP
must be specified by a CIDR block.
Note that all the IPs of an external service are not always known. To enable egress TCP traffic, only the IPs that are
used by the applications must be specified.
Also note that the IPs of an external service are not always static, for example in the case of
CDNs. Sometimes the IPs are static most of the time, but can
be changed from time to time, for example due to infrastructure changes. In these cases, if the range of the possible
IPs is known, you should specify the range by CIDR blocks. If the range of the possible IPs is not known, service
entries for TCP cannot be used and
the external services must be called directly,
bypassing the sidecar proxies.
Relation to virtual machines support
Note that the scenario described in this post is different from the
Bookinfo with Virtual Machines example. In that scenario, a MySQL instance runs on an
external
(outside the cluster) machine (a bare metal or a VM), integrated with the Istio service mesh. The MySQL service becomes
a first-class citizen of the mesh with all the beneficial features of Istio applicable. Among other things, the service
becomes addressable by a local cluster domain name, for example by mysqldb.vm.svc.cluster.local, and the communication
to it can be secured by
mutual TLS authentication. There is no need to create a service
entry to access this service; however, the service must be registered with Istio. To enable such integration, Istio
components (Envoy proxy, node-agent, _istio-agent_) must be installed on the machine and the Istio control plane
(Pilot, Mixer, Citadel) must be accessible from it. See the
Istio VM-related tasks for more details.
In our case, the MySQL instance can run on any machine or can be provisioned as a service by a cloud provider. There is
no requirement to integrate the machine with Istio. The Istio control plane does not have to be accessible from the
machine. In the case of MySQL as a service, the machine which MySQL runs on may be not accessible and installing on it
the required components may be impossible. In our case, the MySQL instance is addressable by its global domain name,
which could be beneficial if the consuming applications expect to use that domain name. This is especially relevant when
that expected domain name cannot be changed in the deployment configuration of the consuming applications.
Cleanup


Drop the test database and the bookinfo user:
$ mysqlsh --sql --ssl-mode=REQUIRED -u admin -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "drop database test; drop user bookinfo;"
OR
For mysql and the local database:
$ mysql -u root -p --host $MYSQL_DB_HOST --port $MYSQL_DB_PORT -e "drop database test; drop user bookinfo;"


Remove the virtual services:
Zip$ kubectl delete -f @samples/bookinfo/networking/virtual-service-ratings-mysql.yaml@
Deleted config: virtual-service/default/reviews
Deleted config: virtual-service/default/ratings


Undeploy ratings v2-mysql:
Zip$ kubectl delete -f @samples/bookinfo/platform/kube/bookinfo-ratings-v2-mysql.yaml@
deployment "ratings-v2-mysql" deleted


Delete the service entry:
$ kubectl delete serviceentry mysql-external -n default
Deleted config: serviceentry mysql-external


Conclusion
In this blog post, I demonstrated how the microservices in an Istio service mesh can consume external services via TCP.
By default, Istio blocks all the traffic, TCP and HTTP, to the hosts outside the cluster. To enable such traffic for
TCP, TCP mesh-external service entries must be created for the service mesh.



Consuming External Web Services
Wed, 31 Jan 2018 00:00:00 +0000
In many cases, not all the parts of a microservices-based application reside in a service mesh. Sometimes, the
microservices-based applications use functionality provided by legacy systems that reside outside the mesh. You may want
to migrate these systems to the service mesh gradually. Until these systems are migrated, they must be accessed by the
applications inside the mesh. In other cases, the applications use web services provided by third parties.
In this blog post, I modify the Istio Bookinfo Sample Application to fetch book details from
an external web service (Google Books APIs). I show how
to enable egress HTTPS traffic in Istio by using mesh-external service entries. I provide two options for egress
HTTPS traffic and describe the pros and cons of each of the options.
Initial setting
To demonstrate the scenario of consuming an external web service, I start with a Kubernetes cluster with Istio installed. Then I deploy
Istio Bookinfo Sample Application. This application uses the details microservice to fetch
book details, such as the number of pages and the publisher. The original details microservice provides the book
details without consulting any external service.
The example commands in this blog post work with Istio 1.0+, with or without
mutual TLS enabled. The Bookinfo configuration files reside in the
samples/bookinfo directory of the Istio release archive.
Here is a copy of the end-to-end architecture of the application from the original
Bookinfo sample application.

    
        
            
        
    
    The Original Bookinfo Application

Perform the steps in the
Deploying the application,
Confirm the app is running,
Apply default destination rules
sections, and
change Istio to the blocking-egress-by-default policy.
Bookinfo with HTTPS access to a Google Books web service
Deploy a new version of the details microservice, v2, that fetches the book details from Google Books APIs. Run the following command; it sets the
DO_NOT_ENCRYPT environment variable of the service’s container to false. This setting will instruct the deployed
service to use HTTPS (instead of HTTP) to access to the external service.
Zip$ kubectl apply -f @samples/bookinfo/platform/kube/bookinfo-details-v2.yaml@ --dry-run -o yaml | kubectl set env --local -f - 'DO_NOT_ENCRYPT=false' -o yaml | kubectl apply -f -
The updated architecture of the application now looks as follows:

    
        
            
        
    
    The Bookinfo Application with details V2

Note that the Google Books web service is outside the Istio service mesh, the boundary of which is marked by a dashed
line.
Now direct all the traffic destined to the details microservice, to details version v2.
Zip$ kubectl apply -f @samples/bookinfo/networking/virtual-service-details-v2.yaml@
Note that the virtual service relies on a destination rule that you created in the Apply default destination rules section.
Access the web page of the application, after
determining the ingress IP and port.
Oops… Instead of the book details you have the Error fetching product details message displayed:

    
        
            
        
    
    The Error Fetching Product Details Message

The good news is that your application did not crash. With a good microservice design, you do not have failure
propagation. In your case, the failing details microservice does not cause the productpage microservice to fail.
Most of the functionality of the application is still provided, despite the failure in the details microservice. You
have graceful service degradation: as you can see, the reviews and the ratings are displayed correctly, and the
application is still useful.
So what might have gone wrong? Ah… The answer is that I forgot to tell you to enable traffic from inside the mesh to
an external service, in this case to the Google Books web service. By default, the Istio sidecar proxies
(Envoy proxies) block all the traffic to destinations outside the cluster. To enable
such traffic, you must define a
mesh-external service entry.
Enable HTTPS access to a Google Books web service
No worries, define a mesh-external service entry and fix your application. You must also define a virtual
service to perform routing by SNI to the external service.
$ kubectl apply -f - <

Now accessing the web page of the application displays the book details without error:

    
        
            
        
    
    Book Details Displayed Correctly

You can query your service entries:
$ kubectl get serviceentries
NAME         AGE
googleapis   8m
You can delete your service entry:
$ kubectl delete serviceentry googleapis
serviceentry "googleapis" deleted
and see in the output that the service entry is deleted.
Accessing the web page after deleting the service entry produces the same error that you experienced before, namely
Error fetching product details. As you can see, the service entries are defined dynamically, as are many other
Istio configuration artifacts. The Istio operators can decide dynamically which domains they allow the microservices to
access. They can enable and disable traffic to the external domains on the fly, without redeploying the microservices.
Cleanup of HTTPS access to a Google Books web service
ZipZip$ kubectl delete serviceentry googleapis
$ kubectl delete virtualservice googleapis
$ kubectl delete -f @samples/bookinfo/networking/virtual-service-details-v2.yaml@
$ kubectl delete -f @samples/bookinfo/platform/kube/bookinfo-details-v2.yaml@
TLS origination by Istio
There is a caveat to this story. Suppose you want to monitor which specific set of
Google APIs your microservices use
(Books,
Calendar, Tasks etc.)
Suppose you want to enforce a policy that using only
Books APIs is allowed. Suppose you want to monitor the
book identifiers that your microservices access. For these monitoring and policy tasks you need to know the URL path.
Consider for example the URL
www.googleapis.com/books/v1/volumes?q=isbn:0486424618.
In that URL, Books APIs is specified by the path segment
/books, and the ISBN number by the path segment
/volumes?q=isbn:0486424618. However, in HTTPS, all the HTTP details (hostname, path, headers etc.) are encrypted and
such monitoring and policy enforcement by the sidecar proxies is not possible. Istio can only know the server name of
the encrypted requests by the SNI (Server Name Indication) field,
in this case www.googleapis.com.
To allow Istio to perform monitoring and policy enforcement of egress requests based on HTTP details, the microservices
must issue HTTP requests. Istio then opens an HTTPS connection to the destination (performs TLS origination). The code
of the microservices must be written differently or configured differently, according to whether the microservice runs
inside or outside an Istio service mesh. This contradicts the Istio design goal of maximizing transparency. Sometimes you need to compromise…
The diagram below shows two options for sending HTTPS traffic to external services. On the top, a microservice sends
regular HTTPS requests, encrypted end-to-end. On the bottom, the same microservice sends unencrypted HTTP requests
inside a pod, which are intercepted by the sidecar Envoy proxy. The sidecar proxy performs TLS origination, so the
traffic between the pod and the external service is encrypted.

    
        
            
        
    
    HTTPS traffic to external services, with TLS originated by the microservice vs. by the sidecar proxy

Here is how both patterns are supported in the
Bookinfo details microservice code, using the Ruby
net/http module:
uri = URI.parse('https://www.googleapis.com/books/v1/volumes?q=isbn:' + isbn)
http = Net::HTTP.new(uri.host, ENV['DO_NOT_ENCRYPT'] === 'true' ? 80:443)
...
unless ENV['DO_NOT_ENCRYPT'] === 'true' then
     http.use_ssl = true
end
When the DO_NOT_ENCRYPT environment variable is defined, the request is performed without SSL (plain HTTP) to port 80.
You can set the DO_NOT_ENCRYPT environment variable to “true” in the
Kubernetes deployment spec of details v2,
the container section:
env:
- name: DO_NOT_ENCRYPT
  value: "true"
In the next section you will configure TLS origination for accessing an external web service.
Bookinfo with TLS origination to a Google Books web service


Deploy a version of details v2 that sends an HTTP request to
Google Books APIs. The DO_NOT_ENCRYPT variable
is set to true in
bookinfo-details-v2.yaml.
Zip$ kubectl apply -f @samples/bookinfo/platform/kube/bookinfo-details-v2.yaml@


Direct the traffic destined to the details microservice, to details version v2.
Zip$ kubectl apply -f @samples/bookinfo/networking/virtual-service-details-v2.yaml@


Create a mesh-external service entry for www.google.apis , a virtual service to rewrite the destination port from
80 to 443, and a destination rule to perform TLS origination.
$ kubectl apply -f - <



Access the web page of the application and verify that the book details are displayed without errors.


Enable Envoy’s access logging


Check the log of the sidecar proxy of details v2 and see the HTTP request.
$ kubectl logs $(kubectl get pods -l app=details -l version=v2 -o jsonpath='{.items[0].metadata.name}') istio-proxy | grep googleapis
[2018-08-09T11:32:58.171Z] "GET /books/v1/volumes?q=isbn:0486424618 HTTP/1.1" 200 - 0 1050 264 264 "-" "Ruby" "b993bae7-4288-9241-81a5-4cde93b2e3a6" "www.googleapis.com:80" "172.217.20.74:80"
EOF
Note the URL path in the log, the path can be monitored and access policies can be applied based on it. To read more
about monitoring and access policies for HTTP egress traffic, check out this blog post.


Cleanup of TLS origination to a Google Books web service
ZipZip$ kubectl delete serviceentry googleapis
$ kubectl delete virtualservice rewrite-port-for-googleapis
$ kubectl delete destinationrule originate-tls-for-googleapis
$ kubectl delete -f @samples/bookinfo/networking/virtual-service-details-v2.yaml@
$ kubectl delete -f @samples/bookinfo/platform/kube/bookinfo-details-v2.yaml@
Relation to Istio mutual TLS
Note that the TLS origination in this case is unrelated to
the mutual TLS applied by Istio. The TLS origination for the
external services will work, whether the Istio mutual TLS is enabled or not. The mutual TLS secures
service-to-service communication inside the service mesh and provides each service with a strong identity. The
external services in this blog post were accessed using one-way TLS, the same mechanism used to secure communication between a
web browser and a web server. TLS is applied to the communication with external services to verify the identity of the
external server and to encrypt the traffic.
Conclusion
In this blog post I demonstrated how microservices in an Istio service mesh can consume external web services by
HTTPS. By default, Istio blocks all the traffic to the hosts outside the cluster. To enable such traffic, mesh-external
service entries must be created for the service mesh. It is possible to access the external sites either by
issuing HTTPS requests, or by issuing HTTP requests with Istio performing TLS origination. When the microservices issue
HTTPS requests, the traffic is encrypted end-to-end, however Istio cannot monitor HTTP details like the URL paths of the
requests. When the microservices issue HTTP requests, Istio can monitor the HTTP details of the requests and enforce
HTTP-based access policies. However, in that case the traffic between microservice and the sidecar proxy is unencrypted.
Having part of the traffic unencrypted can be forbidden in organizations with very strict security requirements.



Mixer and the SPOF Myth
Thu, 07 Dec 2017 00:00:00 +0000
As Mixer is in the request path, it is natural to question how it impacts
overall system availability and latency. A common refrain we hear when people first glance at Istio architecture diagrams is
“Isn’t this just introducing a single point of failure?”
In this post, we’ll dig deeper and cover the design principles that underpin Mixer and the surprising fact Mixer actually
increases overall mesh availability and reduces average request latency.
Istio’s use of Mixer has two main benefits in terms of overall system availability and latency:


Increased SLO. Mixer insulates proxies and services from infrastructure backend failures, enabling higher effective mesh availability. The mesh as a whole tends to experience a lower rate of failure when interacting with the infrastructure backends than if Mixer were not present.


Reduced Latency. Through aggressive use of shared multi-level caches and sharding, Mixer reduces average observed latencies across the mesh.


We’ll explain this in more detail below.
How we got here
For many years at Google, we’ve been using an internal API & service management system to handle the many APIs exposed by Google. This system has been fronting the world’s biggest services (Google Maps, YouTube, Gmail, etc) and sustains a peak rate of hundreds of millions of QPS. Although this system has served us well, it had problems keeping up with Google’s rapid growth, and it became clear that a new architecture was needed in order to tamp down ballooning operational costs.
In 2014, we started an initiative to create a replacement architecture that would scale better. The result has proven extremely successful and has been gradually deployed throughout Google, saving in the process millions of dollars a month in ops costs.
The older system was built around a centralized fleet of fairly heavy proxies into which all incoming traffic would flow, before being forwarded to the services where the real work was done. The newer architecture jettisons the shared proxy design and instead consists of a very lean and efficient distributed sidecar proxy sitting next to service instances, along with a shared fleet of sharded control plane intermediaries:

    
        
            
        
    
    Google's API & Service Management System

Look familiar? Of course: it’s just like Istio! Istio was conceived as a second generation of this distributed proxy architecture. We took the core lessons from this internal system, generalized many of the concepts by working with our partners, and created Istio.
Architecture recap
As shown in the diagram below, Mixer sits between the mesh and the infrastructure backends that support it:

    
        
            
        
    
    Istio Topology

The Envoy sidecar logically calls Mixer before each request to perform precondition checks, and after each request to report telemetry.
The sidecar has local caching such that a relatively large percentage of precondition checks can be performed from cache. Additionally, the
sidecar buffers outgoing telemetry such that it only actually needs to call Mixer once for every several thousands requests. Whereas precondition
checks are synchronous to request processing, telemetry reports are done asynchronously with a fire-and-forget pattern.
At a high level, Mixer provides:


Backend Abstraction. Mixer insulates the Istio components and services within the mesh from the implementation details of individual infrastructure backends.


Intermediation. Mixer allows operators to have fine-grained control over all interactions between the mesh and the infrastructure backends.


However, even beyond these purely functional aspects, Mixer has other characteristics that provide the system with additional benefits.
Mixer: SLO booster
Contrary to the claim that Mixer is a SPOF and can therefore lead to mesh outages, we believe it in fact improves the effective availability of a mesh. How can that be? There are three basic characteristics at play:


Statelessness. Mixer is stateless in that it doesn’t manage any persistent storage of its own.


Hardening. Mixer proper is designed to be a highly reliable component. The design intent is to achieve > 99.999% uptime for any individual Mixer instance.


Caching and Buffering. Mixer is designed to accumulate a large amount of transient ephemeral state.


The sidecar proxies that sit next to each service instance in the mesh must necessarily be frugal in terms of memory consumption, which constrains the possible amount of local caching and buffering. Mixer, however, lives independently and can use considerably larger caches and output buffers. Mixer thus acts as a highly-scaled and highly-available second-level cache for the sidecars.
Mixer’s expected availability is considerably higher than most infrastructure backends (those often have availability of perhaps 99.9%). Its local caches and buffers help mask infrastructure backend failures by being able to continue operating even when a backend has become unresponsive.
Mixer: Latency slasher
As we explained above, the Istio sidecars generally have fairly effective first-level caching. They can serve the majority of their traffic from cache. Mixer provides a much greater shared pool of second-level cache, which helps Mixer contribute to a lower average per-request latency.
While it’s busy cutting down latency, Mixer is also inherently cutting down the number of calls your mesh makes to infrastructure backends. Depending on how you’re paying for these backends, this might end up saving you some cash by cutting down the effective QPS to the backends.
Work ahead
We have opportunities ahead to continue improving the system in many ways.
Configuration canaries
Mixer is highly scaled so it is generally resistant to individual instance failures. However, Mixer is still susceptible to cascading
failures in the case when a poison configuration is deployed which causes all Mixer instances to crash basically at the same time
(yeah, that would be a bad day). To prevent this from happening, configuration changes can be canaried to a small set of Mixer instances,
and then more broadly rolled out.
Mixer doesn’t yet do canarying of configuration changes, but we expect this to come online as part of Istio’s ongoing work on reliable
configuration distribution.
Cache tuning
We have yet to fine-tune the sizes of the sidecar and Mixer caches. This work will focus on achieving the highest performance possible using the least amount of resources.
Cache sharing
At the moment, each Mixer instance operates independently of all other instances. A request handled by one Mixer instance will not leverage data cached in a different instance. We will eventually experiment with a distributed cache such as memcached or Redis in order to provide a much larger mesh-wide shared cache, and further reduce the number of calls to infrastructure backends.
Sharding
In very large meshes, the load on Mixer can be great. There can be a large number of Mixer instances, each straining to keep caches primed to
satisfy incoming traffic. We expect to eventually introduce intelligent sharding such that Mixer instances become slightly specialized in
handling particular data streams in order to increase the likelihood of cache hits. In other words, sharding helps improve cache
efficiency by routing related traffic to the same Mixer instance over time, rather than randomly dispatching to
any available Mixer instance.
Conclusion
Practical experience at Google showed that the model of a slim sidecar proxy and a large shared caching control plane intermediary hits a sweet
spot, delivering excellent perceived availability and latency. We’ve taken the lessons learned there and applied them to create more sophisticated and
effective caching, prefetching, and buffering strategies in Istio. We’ve also optimized the communication protocols to reduce overhead when a cache miss does occur.
Mixer is still young. As of Istio 0.3, we haven’t really done significant performance work within Mixer itself. This means when a request misses the sidecar
cache, we spend more time in Mixer to respond to requests than we should. We’re doing a lot of work to improve this in coming months to reduce the overhead
that Mixer imparts in the synchronous precondition check case.
We hope this post makes you appreciate the inherent benefits that Mixer brings to Istio.
Don’t hesitate to post comments or questions to istio-policies-and-telemetry@.



Mixer Adapter Model
Fri, 03 Nov 2017 00:00:00 +0000
Istio 0.2 introduced a new Mixer adapter model which is intended to increase Mixer’s flexibility to address a varied set of infrastructure backends. This post intends to put the adapter model in context and explain how it works.
Why adapters?
Infrastructure backends provide support functionality used to build services. They include such things as access control systems, telemetry capturing systems, quota enforcement systems, billing systems, and so forth. Services traditionally directly integrate with these backend systems, creating a hard coupling and baking-in specific semantics and usage options.
Mixer serves as an abstraction layer between Istio and an open-ended set of infrastructure backends. The Istio components and services that run within the mesh can interact with these backends, while not being coupled to the backends’ specific interfaces.
In addition to insulating application-level code from the details of infrastructure backends, Mixer provides an intermediation model that allows operators to inject and control policies between application code and backends. Operators can control which data is reported to which backend, which backend to consult for authorization, and much more.
Given that individual infrastructure backends each have different interfaces and operational models, Mixer needs custom
code to deal with each and we call these custom bundles of code adapters.
Adapters are Go packages that are directly linked into the Mixer binary. It’s fairly simple to create custom Mixer binaries linked with specialized sets of adapters, in case the default set of adapters is not sufficient for specific use cases.
Philosophy
Mixer is essentially an attribute processing and routing machine. The proxy sends it attributes as part of doing precondition checks and telemetry reports, which it turns into a series of calls into adapters. The operator supplies configuration which describes how to map incoming attributes to inputs for the adapters.

    
        
            
        
    
    Attribute Machine

Configuration is a complex task. In fact, evidence shows that the overwhelming majority of service outages are caused by configuration errors. To help combat this, Mixer’s configuration model enforces a number of constraints designed to avoid errors. For example, the configuration model uses strong typing to ensure that only meaningful attributes or attribute expressions are used in any given context.
Handlers: configuring adapters
Each adapter that Mixer uses requires some configuration to operate. Typically, adapters need things like the URL to their backend, credentials, caching options, and so forth. Each adapter defines the exact configuration data it needs via a protobuf message.
You configure each adapter by creating handlers for them. A handler is a
configuration resource which represents a fully configured adapter ready for use. There can be any number of handlers for a single adapter, making it possible to reuse an adapter in different scenarios.
Templates: adapter input schema
Mixer is typically invoked twice for every incoming request to a mesh service, once for precondition checks and once for telemetry reporting. For every such call, Mixer invokes one or more adapters. Different adapters need different pieces of data as input in order to do their work. A logging adapter needs a log entry, a metric adapter needs a metric, an authorization adapter needs credentials, etc.
Mixer templates are used to describe the exact data that an adapter consumes at request time.
Each template is specified as a protobuf message. A single template describes a bundle of data that is delivered to one or more adapters at runtime. Any given adapter can be designed to support any number of templates, the specific templates the adapter supports is determined by the adapter developer.
metric and logentry are two of the most essential templates used within Istio. They represent respectively the payload to report a single metric and a single log entry to appropriate backends.
Instances: attribute mapping
You control which data is delivered to individual adapters by creating
instances.
Instances control how Mixer uses the attributes delivered
by the proxy into individual bundles of data that can be routed to different adapters.
Creating instances generally requires using attribute expressions. The point of these expressions is to use any attribute or literal value in order to produce a result that can be assigned to an instance’s field.
Every instance field has a type, as defined in the template, every attribute has a
type, and every attribute expression has a type.
You can only assign type-compatible expressions to any given instance fields. For example, you can’t assign an integer expression
to a string field.  This kind of strong typing is designed to minimize the risk of creating bogus configurations.
Rules: delivering data to adapters
The last piece to the puzzle is telling Mixer which instances to send to which handler and when. This is done by
creating rules. Each rule identifies a specific handler and the set of
instances to send to that handler. Whenever Mixer processes an incoming call, it invokes the indicated handler and gives it the specific set of instances for processing.
Rules contain matching predicates. A predicate is an attribute expression which returns a true/false value. A rule only takes effect if its predicate expression returns true. Otherwise, it’s like the rule didn’t exist and the indicated handler isn’t invoked.
Future
We are working to improve the end to end experience of using and developing adapters. For example, several new features are planned to make templates more expressive. Additionally, the expression language is being substantially enhanced to be more powerful and well-rounded.
Longer term, we are evaluating ways to support adapters which aren’t directly linked into the main Mixer binary. This would simplify deployment and composition.
Conclusion
The refreshed Mixer adapter model is designed to provide a flexible framework to support an open-ended set of infrastructure backends.
Handlers provide configuration data for individual adapters, templates determine exactly what kind of data different adapters want to consume at runtime, instances let operators prepare this data, rules direct the data to one or more handlers.
You can learn more about Mixer’s overall architecture here, and learn the specifics of templates, handlers,
and rules here. You can find many examples of Mixer configuration resources in the Bookinfo sample
here.



Using Network Policy with Istio
Thu, 10 Aug 2017 00:00:00 +0000
The use of Network Policy to secure applications running on Kubernetes is a now a widely accepted industry best practice.  Given that Istio also supports policy, we want to spend some time explaining how Istio policy and Kubernetes Network Policy interact and support each other to deliver your application securely.
Let’s start with the basics: why might you want to use both Istio and Kubernetes Network Policy? The short answer is that they are good at different things. Consider the main differences between Istio and Network Policy (we will describe “typical” implementations, e.g. Calico, but implementation details can vary with different network providers):

  
      
          
          Istio Policy
          Network Policy
      
  
  
      
          Layer
          “Service” — L7
          “Network” — L3-4
      
      
          Implementation
          User space
          Kernel
      
      
          Enforcement Point
          Pod
          Node
      
  

Layer
Istio policy operates at the “service” layer of your network application. This is Layer 7 (Application) from the perspective of the OSI model, but the de facto model of cloud native applications is that Layer 7 actually consists of at least two layers: a service layer and a content layer. The service layer is typically HTTP, which encapsulates the actual application data (the content layer). It is at this service layer of HTTP that the Istio’s Envoy proxy operates. In contrast, Network Policy operates at Layers 3 (Network) and 4 (Transport) in the OSI model.
Operating at the service layer gives the Envoy proxy a rich set of attributes to base policy decisions on, for protocols it understands, which at present includes HTTP/1.1 & HTTP/2 (gRPC operates over HTTP/2). So, you can apply policy based on virtual host, URL, or other HTTP headers.  In the future, Istio will support a wide range of Layer 7 protocols, as well as generic TCP and UDP transport.
In contrast, operating at the network layer has the advantage of being universal, since all network applications use IP. At the network layer you can apply policy regardless of the layer 7 protocol: DNS, SQL databases, real-time streaming, and a plethora of other services that do not use HTTP can be secured. Network Policy isn’t limited to a classic firewall’s tuple of IP addresses, proto, and ports. Both Istio and Network Policy are aware of rich Kubernetes labels to describe pod endpoints.
Implementation
Istio’s proxy is based on Envoy, which is implemented as a user space daemon in the data plane that
interacts with the network layer using standard sockets. This gives it a large amount of flexibility in processing, and allows it to be
distributed (and upgraded!) in a container.
Network Policy data plane is typically implemented in kernel space (e.g. using iptables, eBPF filters, or even custom kernel modules). Being in kernel space
allows them to be extremely fast, but not as flexible as the Envoy proxy.
Enforcement point
Policy enforcement using the Envoy proxy is implemented inside the pod, as a sidecar container in the same network namespace. This allows a simple deployment model. Some containers are given permission to reconfigure the networking inside their pod (CAP_NET_ADMIN).  If such a service instance is compromised, or misbehaves (as in a malicious tenant) the proxy can be bypassed.
While this won’t let an attacker access other Istio-enabled pods, so long as they are correctly configured, it opens several attack vectors:

Attacking unprotected pods
Attempting to deny service to protected pods by sending lots of traffic
Exfiltrating data collected in the pod
Attacking the cluster infrastructure (servers or Kubernetes services)
Attacking services outside the mesh, like databases, storage arrays, or legacy systems.

Network Policy is typically enforced at the host node, outside the network namespace of the guest pods. This means that compromised or misbehaving pods must break into the root namespace to avoid enforcement. With the addition of egress policy due in Kubernetes 1.8, this difference makes Network Policy a key part of protecting your infrastructure from compromised workloads.
Examples
Let’s walk through a few examples of what you might want to do with Kubernetes Network Policy for an Istio-enabled application.  Consider the Bookinfo sample application.  We’re going to cover the following use cases for Network Policy:

Reduce attack surface of the application ingress
Enforce fine-grained isolation within the application

Reduce attack surface of the application ingress
Our application ingress controller is the main entry-point to our application from the outside world.  A quick peek at istio.yaml (used to install Istio) defines the Istio ingress like this:
apiVersion: v1
kind: Service
metadata:
  name: istio-ingress
  labels:
    istio: ingress
spec:
  type: LoadBalancer
  ports:
  - port: 80
    name: http
  - port: 443
    name: https
  selector:
    istio: ingress
The istio-ingress exposes ports 80 and 443.  Let’s limit incoming traffic to just these two ports.  Envoy has a built-in administrative interface, and we don’t want a misconfigured istio-ingress image to accidentally expose our admin interface to the outside world.  This is an example of defense in depth: a properly configured image should not expose the interface, and a properly configured Network Policy will prevent anyone from connecting to it.  Either can fail or be misconfigured and we are still protected.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: istio-ingress-lockdown
  namespace: default
spec:
  podSelector:
    matchLabels:
      istio: ingress
  ingress:
  - ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 443
Enforce fine-grained isolation within the application
Here is the service graph for the Bookinfo application.

    
        
            
        
    
    Bookinfo Service Graph

This graph shows every connection that a correctly functioning application should be allowed to make.  All other connections, say from the Istio Ingress directly to the Rating service, are not part of the application.  Let’s lock out those extraneous connections so they cannot be used by an attacker.  Imagine, for example, that the Ingress pod is compromised by an exploit that allows an attacker to run arbitrary code.  If we only allow connections to the Product Page pods using Network Policy, the attacker has gained no more access to my application backends even though they have compromised a member of the service mesh.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: product-page-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: productpage
  ingress:
  - ports:
    - protocol: TCP
      port: 9080
    from:
    - podSelector:
        matchLabels:
          istio: ingress
You can and should write a similar policy for each service to enforce which other pods are allowed to access each.
Summary
Our take is that Istio and Network Policy have different strengths in applying policy. Istio is application-protocol aware and highly flexible, making it ideal for applying policy in support of operational goals, like service routing, retries, circuit-breaking, etc, and for security that operates at the application layer, such as token validation. Network Policy is universal, highly efficient, and isolated from the pods, making it ideal for applying policy in support of network security goals. Furthermore, having policy that operates at different layers of the network stack is a really good thing as it gives each layer specific context without commingling of state and allows separation of responsibility.
This post is based on the three part blog series by Spike Curtis, one of the Istio team members at Tigera.  The full series can be found here: https://www.projectcalico.org/using-network-policy-in-concert-with-istio/



Canary Deployments using Istio
Wed, 14 Jun 2017 00:00:00 +0000

    
        
            
        
        This post was updated on May 16, 2018 to use the latest version of the traffic management model.
    


One of the benefits of the Istio project is that it provides the control needed to deploy canary services. The idea behind
canary deployment (or rollout) is to introduce a new version of a service by first testing it using a small percentage of user
traffic, and then if all goes well, increase, possibly gradually in increments, the percentage while simultaneously phasing out
the old version. If anything goes wrong along the way, we abort and roll back to the previous version. In its simplest form,
the traffic sent to the canary version is a randomly selected percentage of requests, but in more sophisticated schemes it
can be based on the region, user, or other properties of the request.
Depending on your level of expertise in this area, you may wonder why Istio’s support for canary deployment is even needed, given that platforms like Kubernetes already provide a way to do version rollout and canary deployment. Problem solved, right? Well, not exactly. Although doing a rollout this way works in simple cases, it’s very limited, especially in large scale cloud environments receiving lots of (and especially varying amounts of) traffic, where autoscaling is needed.
Canary deployment in Kubernetes
As an example, let’s say we have a deployed service, helloworld version v1, for which we would like to test (or simply roll out) a new version, v2. Using Kubernetes, you can roll out a new version of the helloworld service by simply updating the image in the service’s corresponding Deployment and letting the rollout happen automatically. If we take particular care to ensure that there are enough v1 replicas running when we start and pause the rollout after only one or two v2 replicas have been started, we can keep the canary’s effect on the system very small. We can then observe the effect before deciding to proceed or, if necessary, roll back. Best of all, we can even attach a horizontal pod autoscaler to the Deployment and it will keep the replica ratios consistent if, during the rollout process, it also needs to scale replicas up or down to handle traffic load.
Although fine for what it does, this approach is only useful when we have a properly tested version that we want to deploy, i.e., more of a blue/green, a.k.a. red/black, kind of upgrade than a “dip your feet in the water” kind of canary deployment. In fact, for the latter (for example, testing a canary version that may not even be ready or intended for wider exposure), the canary deployment in Kubernetes would be done using two Deployments with common pod labels. In this case, we can’t use autoscaling anymore because it’s now being done by two independent autoscalers, one for each Deployment, so the replica ratios (percentages) may vary from the desired ratio, depending purely on load.
Whether we use one deployment or two, canary management using deployment features of container orchestration platforms like Docker, Mesos/Marathon, or Kubernetes has a fundamental problem: the use of instance scaling to manage the traffic; traffic version distribution and replica deployment are not independent in these systems. All replica pods, regardless of version, are treated the same in the kube-proxy round-robin pool, so the only way to manage the amount of traffic that a particular version receives is by controlling the replica ratio. Maintaining canary traffic at small percentages requires many replicas (e.g., 1% would require a minimum of 100 replicas). Even if we ignore this problem, the deployment approach is still very limited in that it only supports the simple (random percentage) canary approach. If, instead, we wanted to limit the visibility of the canary to requests based on some specific criteria, we still need another solution.
Enter Istio
With Istio, traffic routing and replica deployment are two completely independent functions. The number of pods implementing services are free to scale up and down based on traffic load, completely orthogonal to the control of version traffic routing. This makes managing a canary version in the presence of autoscaling a much simpler problem. Autoscalers may, in fact, respond to load variations resulting from traffic routing changes, but they are nevertheless functioning independently and no differently than when loads change for other reasons.
Istio’s routing rules also provide other important advantages; you can easily control
fine-grained traffic percentages (e.g., route 1% of traffic without requiring 100 pods) and you can control traffic using other criteria (e.g., route traffic for specific users to the canary version). To illustrate, let’s look at deploying the helloworld service and see how simple the problem becomes.
We begin by defining the helloworld Service, just like any other Kubernetes service, something like this:
apiVersion: v1
kind: Service
metadata:
name: helloworld
labels:
  app: helloworld
spec:
  selector:
    app: helloworld
  ...
We then add 2 Deployments, one for each version (v1 and v2), both of which include the service selector’s app: helloworld label:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld-v1
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: helloworld
        version: v1
    spec:
      containers:
      - image: helloworld-v1
        ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld-v2
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: helloworld
        version: v2
    spec:
      containers:
      - image: helloworld-v2
        ...
Note that this is exactly the same way we would do a canary deployment using plain Kubernetes, but in that case we would need to adjust the number of replicas of each Deployment to control the distribution of traffic. For example, to send 10% of the traffic to the canary version (v2), the replicas for v1 and v2 could be set to 9 and 1, respectively.
However, since we are going to deploy the service in an Istio enabled cluster, all we need to do is set a routing
rule to control the traffic distribution. For example if we want to send 10% of the traffic to the canary, we could use kubectl
to set a routing rule something like this:
$ kubectl apply -f - <

After setting this rule, Istio will ensure that only one tenth of the requests will be sent to the canary version, regardless of how many replicas of each version are running.
Autoscaling the deployments
Because we don’t need to maintain replica ratios anymore, we can safely add Kubernetes horizontal pod autoscalers to manage the replicas for both version Deployments:
$ kubectl autoscale deployment helloworld-v1 --cpu-percent=50 --min=1 --max=10
deployment "helloworld-v1" autoscaled
$ kubectl autoscale deployment helloworld-v2 --cpu-percent=50 --min=1 --max=10
deployment "helloworld-v2" autoscaled
$ kubectl get hpa
NAME           REFERENCE                 TARGET  CURRENT  MINPODS  MAXPODS  AGE
Helloworld-v1  Deployment/helloworld-v1  50%     47%      1        10       17s
Helloworld-v2  Deployment/helloworld-v2  50%     40%      1        10       15s
If we now generate some load on the helloworld service, we would notice that when scaling begins, the v1 autoscaler will scale up its replicas significantly higher than the v2 autoscaler will for its replicas because v1 pods are handling 90% of the load.
$ kubectl get pods | grep helloworld
helloworld-v1-3523621687-3q5wh   0/2       Pending   0          15m
helloworld-v1-3523621687-73642   2/2       Running   0          11m
helloworld-v1-3523621687-7hs31   2/2       Running   0          19m
helloworld-v1-3523621687-dt7n7   2/2       Running   0          50m
helloworld-v1-3523621687-gdhq9   2/2       Running   0          11m
helloworld-v1-3523621687-jxs4t   0/2       Pending   0          15m
helloworld-v1-3523621687-l8rjn   2/2       Running   0          19m
helloworld-v1-3523621687-wwddw   2/2       Running   0          15m
helloworld-v1-3523621687-xlt26   0/2       Pending   0          19m
helloworld-v2-4095161145-963wt   2/2       Running   0          50m
If we then change the routing rule to send 50% of the traffic to v2, we should, after a short delay, notice that the v1 autoscaler will scale down the replicas of v1 while the v2 autoscaler will perform a corresponding scale up.
$ kubectl get pods | grep helloworld
helloworld-v1-3523621687-73642   2/2       Running   0          35m
helloworld-v1-3523621687-7hs31   2/2       Running   0          43m
helloworld-v1-3523621687-dt7n7   2/2       Running   0          1h
helloworld-v1-3523621687-gdhq9   2/2       Running   0          35m
helloworld-v1-3523621687-l8rjn   2/2       Running   0          43m
helloworld-v2-4095161145-57537   0/2       Pending   0          21m
helloworld-v2-4095161145-9322m   2/2       Running   0          21m
helloworld-v2-4095161145-963wt   2/2       Running   0          1h
helloworld-v2-4095161145-c3dpj   0/2       Pending   0          21m
helloworld-v2-4095161145-t2ccm   0/2       Pending   0          17m
helloworld-v2-4095161145-v3v9n   0/2       Pending   0          13m
The end result is very similar to the simple Kubernetes Deployment rollout, only now the whole process is not being orchestrated and managed in one place. Instead, we’re seeing several components doing their jobs independently, albeit in a cause and effect manner.
What’s different, however, is that if we now stop generating load, the replicas of both versions will eventually scale down to their minimum (1), regardless of what routing rule we set.
$ kubectl get pods | grep helloworld
helloworld-v1-3523621687-dt7n7   2/2       Running   0          1h
helloworld-v2-4095161145-963wt   2/2       Running   0          1h
Focused canary testing
As mentioned above, the Istio routing rules can be used to route traffic based on specific criteria, allowing more sophisticated canary deployment scenarios. Say, for example, instead of exposing the canary to an arbitrary percentage of users, we want to try it out on internal users, maybe even just a percentage of them. The following command could be used to send 50% of traffic from users at some-company-name.com to the canary version, leaving all other users unaffected:
$ kubectl apply -f - <

As before, the autoscalers bound to the 2 version Deployments will automatically scale the replicas accordingly, but that will have no affect on the traffic distribution.
Summary
In this article we’ve seen how Istio supports general scalable canary deployments, and how this differs from the basic deployment support in Kubernetes. Istio’s service mesh provides the control necessary to manage traffic distribution with complete independence from deployment scaling. This allows for a simpler, yet significantly more functional, way to do canary test and rollout.
Intelligent routing in support of canary deployment is just one of the many features of Istio that will make the production deployment of large-scale microservices-based applications much simpler. Check out istio.io for more information and to try it out.
The sample code used in this article can be found here.



Using Istio to Improve End-to-End Security
Thu, 25 May 2017 00:00:00 +0000
Conventional network security approaches fail to address security threats to distributed applications deployed in dynamic production environments. Today, we describe how Istio authentication enables enterprises to transform their security posture from just protecting the edge to consistently securing all inter-service communications deep within their applications. With Istio authentication, developers and operators can protect services with sensitive data against unauthorized insider access and they can achieve this without any changes to the application code!
Istio authentication is the security component of the broader Istio platform. It incorporates the learnings of securing millions of microservice
endpoints in Google’s production environment.
Background
Modern application architectures are increasingly based on shared services that are deployed and scaled dynamically on cloud platforms. Traditional network edge security (e.g. firewall) is too coarse-grained and allows access from unintended clients. An example of a security risk is stolen authentication tokens that can be replayed from another client. This is a major risk for companies with sensitive data that are concerned about insider threats. Other network security approaches like IP whitelists have to be statically defined, are hard to manage at scale, and are unsuitable for dynamic production environments.
Thus, security administrators need a tool that enables them to consistently, and by default, secure all communication between services across diverse production environments.
Solution: strong service identity and authentication
Google has, over the years, developed architecture and technology to uniformly secure millions of microservice endpoints in its production environment against
external
attacks and insider threats. Key security principles include trusting the endpoints and not the network, strong mutual authentication based on service identity and service level authorization. Istio authentication is based on the same principles.
The version 0.1 release of Istio authentication runs on Kubernetes and provides the following features:


Strong identity assertion between services


Access control to limit the identities that can access a service (and its data)


Automatic encryption of data in transit


Management of keys and certificates at scale


Istio authentication is based on industry standards like mutual TLS and X.509. Furthermore, Google is actively contributing to an open, community-driven service security framework called SPIFFE. As the SPIFFE specifications mature, we intend for Istio authentication to become a reference implementation of the same.
The diagram below provides an overview of the Istio’s service authentication architecture on Kubernetes.

    
        
            
        
    
    Istio Authentication Overview

The above diagram illustrates three key security features:
Strong identity
Istio authentication uses Kubernetes service accounts to identify who the service runs as. The identity is used to establish trust and define service level access policies. The identity is assigned at service deployment time and encoded in the SAN (Subject Alternative Name) field of an X.509 certificate. Using a service account as the identity has the following advantages:


Administrators can configure who has access to a Service Account by using the RBAC feature introduced in Kubernetes 1.6


Flexibility to identify a human user, a service, or a group of services


Stability of the service identity for dynamically placed and auto-scaled workloads


Communication security
Service-to-service communication is tunneled through high performance client side and server side Envoy proxies. The communication between the proxies is secured using mutual TLS. The benefit of using mutual TLS is that the service identity is not expressed as a bearer token that can be stolen or replayed from another source. Istio authentication also introduces the concept of Secure Naming to protect from a server spoofing attacks - the client side proxy verifies that the authenticated server’s service account is allowed to run the named service.
Key management and distribution
Istio authentication provides a per-cluster CA (Certificate Authority) and automated key & certificate management. In this context, Istio authentication:


Generates a key and certificate pair for each service account.


Distributes keys and certificates to the appropriate pods using Kubernetes Secrets.


Rotates keys and certificates periodically.


Revokes a specific key and certificate pair when necessary (future).


The following diagram explains the end to end Istio authentication workflow on Kubernetes:

    
        
            
        
    
    Istio Authentication Workflow

Istio authentication is part of the broader security story for containers. Red Hat, a partner on the development of Kubernetes, has identified 10 Layers of container security. Istio addresses two of these layers: “Network Isolation” and “API and Service Endpoint Management”. As cluster federation evolves on Kubernetes and other platforms, our intent is for Istio to secure communications across services spanning multiple federated clusters.
Benefits of Istio authentication
Defense in depth: When used in conjunction with Kubernetes (or infrastructure) network policies, users achieve higher levels of confidence, knowing that pod-to-pod or service-to-service communication is secured both at network and application layers.
Secure by default: When used with Istio’s proxy and centralized policy engine, Istio authentication can be configured during deployment with minimal or no application change. Administrators and operators can thus ensure that service communications are secured by default and that they can enforce these policies consistently across diverse protocols and runtimes.
Strong service authentication: Istio authentication secures service communication using mutual TLS to ensure that the service identity is not expressed as a bearer token that can be stolen or replayed from another source. This ensures that services with sensitive data can only be accessed from strongly authenticated and authorized clients.
Join us in this journey
Istio authentication is the first step towards providing a full stack of capabilities to protect services with sensitive data from external attacks and insider
threats. While the initial version runs on Kubernetes, our goal is to enable Istio authentication to secure services across diverse production environments. We encourage the
community to join us in making robust service security easy and ubiquitous across different application
stacks and runtime platforms.

Feature	`v1alpha1` RBAC policy	`v1beta1` Authorization Policy
API stability	`alpha`: No backward compatible	`beta`: backward compatible guaranteed
Number of CRDs	Three: `ClusterRbacConfig`, `ServiceRole` and `ServiceRoleBinding`	Only One: `AuthorizationPolicy`
Policy target	service	workload
Deny-by-default behavior	Enabled explicitly by configuring `ClusterRbacConfig`	Enabled implicitly with `AuthorizationPolicy`
Ingress/Egress gateway support	Not supported	Supported
The `"*"` value in policy	Match all contents (empty and non-empty)	Match non-empty contents only

`ClusterRbacConfig.Mode`	`AuthorizationPolicy`
`OFF`	No policy applied
`ON`	A deny-all policy applied in root namespace
`ON_WITH_INCLUSION`	policies should be applied to namespaces or workloads included by `ClusterRbacConfig`
`ON_WITH_EXCLUSION`	policies should be applied to namespaces or workloads excluded by `ClusterRbacConfig`

`ServiceRole`	`AuthorizationPolicy`
`services`	`selector`
`paths`	`paths` in `to`
`methods`	`methods` in `to`
`destination.ip` in constraint	Not supported
`destination.port` in constraint	`ports` in `to`
`destination.labels` in constraint	`selector`
`destination.namespace` in constraint	Replaced by the namespace of the policy, i.e. the `namespace` in metadata
`destination.user` in constraint	Not supported
`experimental.envoy.filters` in constraint	`experimental.envoy.filters` in `when`
`request.headers` in constraint	`request.headers` in `when`

`ServiceRoleBinding`	`AuthorizationPolicy`
`user`	`principals` in `from`
`group`	`request.auth.claims[group]` in `when`
`source.ip` in property	`ipBlocks` in `from`
`source.namespace` in property	`namespaces` in `from`
`source.principal` in property	`principals` in `from`
`request.headers` in property	`request.headers` in `when`
`request.auth.principal` in property	`requestPrincipals` in `from` or `request.auth.principal` in `when`
`request.auth.audiences` in property	`request.auth.audiences` in `when`
`request.auth.presenter` in property	`request.auth.presenter` in `when`
`request.auth.claims` in property	`request.auth.claims` in `when`

Time	Speaker	Affiliation
10:00 - 10:30	`Spencer Krum + Lisa-Marie Namphy`	`IBM / Portworx`
10:30 - 11:00	`Lin Sun / Spencer Krum / Sven Mawson`	`IBM / Google`
11:00 - 11:10	`Lin Sun / Spencer Krum`	`IBM`
11:10 - 11:30	`Jason Yee / Ilan Rabinovich`	`Datadog`
11:30 - 11:50	`April Nassl`	`Google`
11:50 - 12:10	`Spike Curtis`	`Tigera`
12:10 - 12:30	`Shannon Coen`	`Pivotal`
12:30 - 1:00	`Matt Klein`	`Lyft`
1:00 - 1:20	`Zach Jory`	`F5/Aspen Mesh`
1:20 - 1:40	`Dan Ciruli`	`Google`
1:40 - 2:00	`Isaiah Snell-Feikema` / `Greg Hanson`	`IBM`
2:00 - 2:20	`Zach Butcher`	`Tetrate`
2:20 - 2:40	`Ray Hudaihed`	`American Airlines`
2:40 - 3:00	`Christian Posta`	`Red Hat`
3:00 - 3:20	`Google/IBM China`	`Google / IBM`
3:20 - 3:40	`Colby Dyess`	`Tuffin`
3:40 - 4:00	`Rohit Agarwalla`	`Cisco`

Category	API	Versions
Networking	Destination Rule	`v1`, `v1beta1`, `v1alpha3`
	Istio Gateway	`v1`, `v1beta1`, `v1alpha3`
	Service Entry	`v1`, `v1beta1`, `v1alpha3`
	Sidecar scope	`v1`, `v1beta1`, `v1alpha3`
	Virtual Service	`v1`, `v1beta1`, `v1alpha3`
	Workload Entry	`v1`, `v1beta1`, `v1alpha3`
	Workload Group	`v1`, `v1beta1`, `v1alpha3`
	Proxy Config	`v1beta1`
	Envoy Filter	`v1alpha3`
Security	Authorization Policy	`v1`, `v1beta1`
	Peer Authentication	`v1`, `v1beta1`
	Request Authentication	`v1`, `v1beta1`
Telemetry	Telemetry	`v1`, `v1alpha1`
Extension	Wasm Plugin	`v1alpha1`

API Name	Object Types	Status	Recommendation
Gateway APIs	HTTPRoute, Gateway, …	Stable in Gateway API v1.0 (2023)	Use for new deployments, in particular with ambient mode
Istio APIs	Virtual Service, Gateway	`v1` in Istio 1.22 (2024)	Use for existing deployments, or where advanced features are needed
Ingress API	Ingress	Stable in Kubernetes v1.19 (2020)	Use only for legacy deployments

Name	Company	Profile	Seat type
Craig Box	Solo.io	`craigbox`	Contribution seat
Rob Cernich	Red Hat	`rcernich`	Contribution seat
Mitch Connors	Aviatrix	`therealmitchconnors`	Community seat
Iris (Shaojun) Ding	Intel	`irisdingbj`	Community seat
Cameron Etezadi	Google	`cetezadi`	Contribution seat
John Howard	Google	`howardjohn`	Contribution seat
Faseela K	Ericsson Software Technology	`kfaseela`	Community seat
Kebe Liu	DaoCloud	`kebe7jun`	Contribution seat
Jamie Longmuir	Red Hat	`longmuir`	Contribution seat
Keith Mattix	Microsoft	`keithmattix`	Community seat
Justin Pettit	Google	`justinpettit`	Contribution seat
Lin Sun	Solo.io	`linsun`	Contribution seat
Zhonghu Xu	Huawei	`hzxuzhonghu`	Contribution seat

Config Distribution	Namespace 1	Namespace 2	Total
Sidecars	25 configurations * 250 sidecars	25 configurations * 250 sidecars	12500
Waypoints	25 configurations * 2 waypoints	25 configurations * 2 waypoints	100
Waypoints / Sidecars	0.8%	0.8%	0.8%

	Client `mCPU`	Client Memory (`MiB`)	Server `mCPU`	Server Memory (`MiB`)
Envoy Plaintext	320.44	66.93	243.78	64.91
Envoy mTLS	340.87	66.76	309.82	64.82
Proxyless Plaintext	0.72	23.54	0.84	24.31
Proxyless mTLS	0.73	25.05	0.78	25.43

Session time (CST)	Title
13:30 - 13:50	Sign in
13:50 - 14:00	Welcome Craig Box, Istio Steering Committee member, Google Cloud Iris Ding, cloud computing engineer, Intel
14:00 - 14:30	Interpretation of the “Service Grid Technical Capability Requirements” Standard Yin Xia Mengxue, Engineer, Cloud Computing Department, Academy of Information and Communications Technology
14:30 - 15:00	Service Mesh Data Plane Hot Upgrade Shi Zehuan, Alibaba Cloud
15:00 - 15:30	Envoy Principle Introduction and Online Problem Pit Zhang Wei, Data Plane Technical Expert, Huawei Cloud Service Mesh
15:30 - 15:45	Coffee break
15:45 - 16:15	Use eBPF to accelerate Istio/Envoy networking Zhong Luyao, Intel
16:15 - 16:45	Full-stack service mesh: how Aeraki helps you manage any Layer 7 traffic in Istio Huabing Zhao, Senior Engineer, Tencent Cloud
16:45 - 17:15	Securing workload deployment with Istio CNI Zhang Zhihan, Tetrate

Routing	Endpoint
Primary	http://dynamodb.us-east-1.amazonaws.com
Failover	http://dynamodb.us-west-1.amazonaws.com

Major Differences	`v1alpha1`	`v1beta1`
API stability	not backward compatible	backward compatible
mTLS	`MeshPolicy` and `Policy`	`PeerAuthentication`
JWT	`MeshPolicy` and `Policy`	`RequestAuthentication`
Authorization	`ClusterRbacConfig`, `ServiceRole` and `ServiceRoleBinding`	`AuthorizationPolicy`
Policy target	service name based	workload selector based
Port number	service ports	workload ports

Standard	Sensitive data
PCI DSS	payment card data
FedRAMP	federal information, data and metadata
HIPAA	personal health data
GDPR	personal data

	Istio Policy	Network Policy
Layer	“Service” — L7	“Network” — L3-4
Implementation	User space	Kernel
Enforcement Point	Pod	Node