Skip to content

Conversation

@zhaoyunjiong
Copy link

We plan to run the Iceberg REST Catalog (IRC) and PySpark within an Alibaba Cloud ACK cluster. Since using AccessKey ID, AccessKey Secret, and SecurityToken for OSS authentication is not feasible in our setup, we have chosen to adopt RRSA as the authentication method. This PR introduces that change.

@openinx
Copy link
Member

openinx commented Dec 3, 2025

Hi @zhaoyunjiong , thanks for the contribution ! Could you please refer me the alibaba cloud document about the RRSA please ? I don't have much context about this, if there is one, i'd love to read and understand it, thanks.

}

static class DefaultAliyunClientFactory implements AliyunClientFactory {
private static final Logger LOG = LoggerFactory.getLogger(DefaultAliyunClientFactory.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We ususally define the LOG in the class file rather than the inner class in apache iceberg. Maybe we can move it to the AliyunClientFactories.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move to AliyunClientFactories, the class name from LOG will be "AliyunClientFactories", which will make log messages confusing.

DefaultAliyunClientFactory() {}

/**
* Check if RRSA environment variables are present. RRSA requires
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add more description about the RRSA ? Actually I also don't have much context about this, which I think we need to provide more context for others to understand and mantain this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@openinx
Copy link
Member

openinx commented Dec 3, 2025

Hi @zhaoyunjiong , Basically I understand your idea, we want to use the customized credentials provider to automatically get the temporary credentials, rather than configuring the static credentials inside the iceberg table properties (which is not safe). Then is it possible to follow the iceberg-aws module that we've already did, which means introducing a separate table properties to specify the credentials provider that you want to configure.

Please see the iceberg-aws code here:

public static final String CLIENT_CREDENTIALS_PROVIDER = "client.credentials-provider";

And with this, you can configure what kind of credentials provider implementation that you want.

public AwsCredentialsProvider credentialsProvider(
String accessKeyId, String secretAccessKey, String sessionToken) {
if (refreshCredentialsEnabled && !Strings.isNullOrEmpty(refreshCredentialsEndpoint)) {
clientCredentialsProviderProperties.putAll(allProperties);
clientCredentialsProviderProperties.put(
VendedCredentialsProvider.URI, refreshCredentialsEndpoint);
return credentialsProvider(VendedCredentialsProvider.class.getName());
}
if (!Strings.isNullOrEmpty(accessKeyId) && !Strings.isNullOrEmpty(secretAccessKey)) {
if (Strings.isNullOrEmpty(sessionToken)) {
return StaticCredentialsProvider.create(
AwsBasicCredentials.create(accessKeyId, secretAccessKey));
} else {
return StaticCredentialsProvider.create(
AwsSessionCredentials.create(accessKeyId, secretAccessKey, sessionToken));
}
}
if (!Strings.isNullOrEmpty(this.clientCredentialsProvider)) {
return credentialsProvider(this.clientCredentialsProvider);
}
// Create a new credential provider for each client
return DefaultCredentialsProvider.builder().build();

And another key question from my side is: How do we gurantee the credential expiration won't interrupt the long-running job ?

@zhaoyunjiong
Copy link
Author

Hi @zhaoyunjiong , thanks for the contribution ! Could you please refer me the alibaba cloud document about the RRSA please ? I don't have much context about this, if there is one, i'd love to read and understand it, thanks.

You can find RRSA document here: https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/use-rrsa-to-authorize-pods-to-access-different-cloud-services

@zhaoyunjiong
Copy link
Author

Hi @zhaoyunjiong , Basically I understand your idea, we want to use the customized credentials provider to automatically get the temporary credentials, rather than configuring the static credentials inside the iceberg table properties (which is not safe). Then is it possible to follow the iceberg-aws module that we've already did, which means introducing a separate table properties to specify the credentials provider that you want to configure.

We have multiple different components that will need to use iceberg-aliyun. It's possible to pass a custom Credentials Provider, but it will be very difficult and time-consuming to change multiple different projects to support that.

And another key question from my side is: How do we gurantee the credential expiration won't interrupt the long-running job ?

The OSS client constructor only stores the CredentialsProvider reference. Before sending a request to the OSS service, inside doOperation(), it will call createDefaultContext(), which will call credsProvider.getCredentials(). This will automatically refresh the credentials if they expired or are within a 3-minute window. You can find the code here:

https://github.com/aliyun/aliyun-oss-java-sdk/blob/a69a093ad639acb4de8fca5b35c80d55685713c1/src/main/java/com/aliyun/oss/internal/OSSOperation.java#L185

https://github.com/aliyun/aliyun-oss-java-sdk/blob/a69a093ad639acb4de8fca5b35c80d55685713c1/src/main/java/com/aliyun/oss/internal/OSSOperation.java#L257-L259

@github-actions
Copy link

github-actions bot commented Jan 3, 2026

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 3, 2026
@openinx
Copy link
Member

openinx commented Jan 3, 2026

Hi @zhaoyunjiong Could you please help to resolve the conflicts ? Thanks.

@openinx
Copy link
Member

openinx commented Jan 3, 2026

Hi @zhaoyunjiong, I will suggest to solve this problem in an more abstraction approach, as we discussed before.

We have multiple different components that will need to use iceberg-aliyun. It's possible to pass a custom Credentials Provider, but it will be very difficult and time-consuming to change multiple different projects to support that.

Since the OSSClientBuilder support the abstracted CredentialsProvider interface, which means any user actually can custom their own CredentialsProvider implementation. And the auto-refresh OIDCRoleArnCredentialProvider is one of the CredentialsProvider implementation, and people may have some other kind of the implementations. The key point is: we may don't have the enough resources to support all kinds of the CredentialsProvider implementations in the offical apache iceberg repo, and we also shouldn't maintain all of them because the offical repo incline to maintain the common and general abstraction, so that peple can leverage the flexibility to custom their own. From another aspect, if users want to drive their own services which depends on this, they won't be blocked by the offical reviewing and merging process.

That's why I highly suggest we introduce the abstraction to support the general customized CredentialsProvider.

@openinx
Copy link
Member

openinx commented Jan 3, 2026

The usage may like this ( let's take the spark and hive catalog as an example):

SET spark.sql.catalog.demo= org.apache.iceberg.spark.SparkCatalog;
SET spark.sql.catalog.demo.type = hive;
SET spark.sql.catalog.demo.uri=thrift://hms-host:9083;
SET spark.sql.catalog.demo.warehouse=oss://my-bucket/iceberg/warehouse;
SET spark.sql.catalog.demo.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO;
-- with this catalog configuration,  we can direct the iceberg catalog to use the customized aliyun credentials provider
-- and then people can use any kind of the credentials provider implementation to access the correct credentials.
SET spark.sql.catalog.demo.client.credentials-provider=com.test.CustomizedCredentialsProviderImplementations;

USE demo;

@zhaoyunjiong
Copy link
Author

Hi @zhaoyunjiong, I will suggest to solve this problem in an more abstraction approach, as we discussed before.

We have multiple different components that will need to use iceberg-aliyun. It's possible to pass a custom Credentials Provider, but it will be very difficult and time-consuming to change multiple different projects to support that.

Since the OSSClientBuilder support the abstracted CredentialsProvider interface, which means any user actually can custom their own CredentialsProvider implementation. And the auto-refresh OIDCRoleArnCredentialProvider is one of the CredentialsProvider implementation, and people may have some other kind of the implementations. The key point is: we may don't have the enough resources to support all kinds of the CredentialsProvider implementations in the offical apache iceberg repo, and we also shouldn't maintain all of them because the offical repo incline to maintain the common and general abstraction, so that peple can leverage the flexibility to custom their own. From another aspect, if users want to drive their own services which depends on this, they won't be blocked by the offical reviewing and merging process.

That's why I highly suggest we introduce the abstraction to support the general customized CredentialsProvider.

One thing I forgot to mention is that this will only be a temporary solution until the Alibaba Cloud OSS SDK for Java V2 is production-ready. Once V2 is ready, the correct approach will be to upgrade to it, as it will support the use case I need.

@github-actions github-actions bot removed the stale label Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants