Skip to content

CDAP-15864. Google Drive sink. Base code. (dvitiiuk)#2

Open
dvitiiuk wants to merge 13 commits intodevelopfrom
CDAP-15864-google-drive-sink
Open

CDAP-15864. Google Drive sink. Base code. (dvitiiuk)#2
dvitiiuk wants to merge 13 commits intodevelopfrom
CDAP-15864-google-drive-sink

Conversation

@dvitiiuk
Copy link
Owner

@dvitiiuk dvitiiuk commented Sep 23, 2019

JIRA: https://issues.cask.co/browse/CDAP-15864
DOC: https://wiki.cask.co/display/CE/Google+Drive+plugins

This is the base PR for Google Drive batch sink plugin.

TODOs:
Cover the code with tests.
Remove workaround for OAuth2 authentication.

Undocumented points:
As part of OAuth2 workaround 3 new properties were added for source and sink plugins: "Client Id", "Client secret", "Refresh token".
"App Id" property was removed because it has no sense in the scope of Google Authentication.
Sink generates random 16-symbol file name when "name" is not a part of the input scheme.

Tested:
File source -> Google Drive sink pipeline

fileToWrite.setName(fileFromFolder.getFile().getName());
fileToWrite.setParents(Collections.singletonList(folderId));
ByteArrayContent fileContent = new ByteArrayContent(null, fileFromFolder.getContent());
service.files().create(fileToWrite, fileContent).execute();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this gives a good readable by end user exceptions. So if this fails when something goes wrong. User can diagnose.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOException is allowed for GoogleDriveRecordWriter.write(...), so I this is not need to replace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't understand the question. I asked if it gives messages which can be easily understood by user. In case something goes wrong, like disk full, directory deleted or something other.

@aonischuk
Copy link

aonischuk commented Sep 23, 2019

also use IdUtils.validateId(referenceName) in validation

    try {
      IdUtils.validateId(referenceName);
    } catch (IllegalArgumentException ex) {
      failureCollector.addFailure(ex.getMessage(), null).withConfigProperty(REFERENCE_NAME);
    }```

and check for macros during validation. And call validation from both prepareRun and configurePipeline.

pom.xml Outdated
<modelVersion>4.0.0</modelVersion>

<groupId>io.cdap.plugin</groupId>
<name>Google Drive plugin</name>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google Drive plugins

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} catch (Exception e) {
collector.addFailure(
String.format("Exception during authentication/directory properties check: %s", e.getMessage()),
"Check message and reconfigure the plugin");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .withStacktrace

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

private void validateSchemaField(FailureCollector collector, Schema schema, String propertyName,
String propertyValue, Schema requiredSchema) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to check if field is not logical type and if it's nullable than to get non-nullable part.
Code example from one of my plugins:

    Schema.Field vertexField = inputSchema.getField(vertexType);
    if (vertexField == null) {
      failureCollector.addFailure(String.format("Field '%s' is not present in input schema.", vertexType),
                                  null).withConfigProperty(VERTEX)
        .withInputSchemaField(VERTEX, null);
    } else {
      Schema vertexFieldSchema = vertexField.getSchema();

      if (vertexFieldSchema.isNullable()) {
        vertexFieldSchema = vertexFieldSchema.getNonNullable();
      }

      if (vertexFieldSchema.getLogicalType() != null || vertexFieldSchema.getType() != Schema.Type.STRING) {
        failureCollector.addFailure(String.format("Field '%s' must be of type 'string' but is of type '%s'.",
                                                  vertexField.getName(), vertexFieldSchema.getDisplayName()),
                                    null).withConfigProperty(VERTEX)
          .withInputSchemaField(VERTEX, null);
      }
    }

Also use .withInputSchemaField

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Description
-----------
Sink plugin saves files from the pipeline to Google Drive directory via Google Drive API.
The minimal input schema contains only byte array with field name configured by **schemaBodyFieldName** property.
Copy link

@aonischuk aonischuk Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemaBodyFieldName
and
schemaNameFieldName

Are core properties of the plugin. Which represent how it works I think they should be at the top of widget and doc not at the bottom.

Also move this description to properties description, instead of general description.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add information about required type of the schema fields.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"display-name": "Google Drive sink",
"configuration-groups": [
{
"label": "Basic",
Copy link

@aonischuk aonischuk Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have two sections:
Basic:

  • Reference Name
  • Schema field used as name of file
  • Schema field used as body of file

Authentication:
everything else.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

fileToWrite.setName(fileFromFolder.getFile().getName());
fileToWrite.setParents(Collections.singletonList(folderId));
ByteArrayContent fileContent = new ByteArrayContent(null, fileFromFolder.getContent());
service.files().create(fileToWrite, fileContent).execute();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't understand the question. I asked if it gives messages which can be easily understood by user. In case something goes wrong, like disk full, directory deleted or something other.

try {
driveClient.isFolderAccessible(directoryIdentifier);
} catch (GoogleJsonResponseException e) {
collector.addFailure(e.getMessage(), "Provide an existing folder identifier")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.withStackTrace

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

try {
driveClient.checkRootFolder();
} catch (GoogleJsonResponseException e) {
collector.addFailure(e.getMessage(), "Provide valid credentials");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withStacktrace

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Nullable
@Name(SCHEMA_NAME_FIELD_NAME)
@Description("Name of the schema field which will be used as name of file.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit to be the same as in doc.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

protected String schemaNameFieldName;

@Name(SCHEMA_BODY_FIELD_NAME)
@Description("Name of the schema field which will be used as body of file.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


}

private void validateReferenceId(FailureCollector collector) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IdUtils.validateReferenceName(referenceName, failureCollector); instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

----------

**clientId** OAuth2 client id.
**File body field** The minimal input schema contains only byte array with field name configured by this property.
Copy link

@aonischuk aonischuk Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description should answer "what this property does". And begin with a noun.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't start with "the" or "a/an". Otherwise all descriptions would start like that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Name(SCHEMA_BODY_FIELD_NAME)
@Description("Name of the schema field which will be used as body of file.")
@Description("The minimal input schema contains only byte array with field name configured by this property.\n" +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update accordingly

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dvitiiuk dvitiiuk force-pushed the CDAP-15864-google-drive-sink branch from 6e51944 to 70125b3 Compare September 24, 2019 10:24
@dvitiiuk dvitiiuk force-pushed the CDAP-15864-google-drive-sink branch from b4aac47 to fa270a9 Compare November 22, 2019 18:06
@dvitiiuk dvitiiuk force-pushed the CDAP-15864-google-drive-sink branch from fa270a9 to f8f25ca Compare November 25, 2019 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants