Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/ssb/mv1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/mvconfig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/mvconfig2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/mvconfig3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/mvlist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/ssb-iot-enriched-avro.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/ssb-job-running.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/ssb-kafka-source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/ssb-new-kafka-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ssb/ssb_job_status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 68 additions & 45 deletions workshop_ssb.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,21 @@ Albeit simple, this task will show the ease of use and power of SQL Stream Build

Before you can start querying data from Kafka topics you need to register the Kafka clusters as _data sources_ in SSB.

. Login to the Cloudera Manager console using username `admin` and password `Supersecret1`.

. On the Cloudera Manager console, click on the Cloudera logo at the top-left corner to ensure you are at the home page and then click on the *SQL Stream Builder* service.

. Click on the *SQLStreamBuilder Console* link to open the SSB UI.
. Click on the *SQLStreamBuilder Console* link to open the SSB UI, and login with the same credentials ( `admin` : `Supersecret1` ).

. On the logon screen, authenticate with user `admin` and password `Supersecret1`.
. You will notice that SSB already has a project named `admin_default`. Click on the blue `-> Open` button to see what's inside.

. You will notice that SSB already has a Kafka cluster registered as a data provider, named `CDP Kafka`. This provider is created automatically for SSB when it is installed on a cluster that also has a Kafka service:
. Under `Data Sources`, you will see a `Kafka` folder with a ready made cluster named `Local Kafka`:
+
image::images/ssb/register-kafka-provider.png[width=800]
image::images/ssb/ssb-kafka-source.png[width=800]

. You can use this screen to add other external Kafka clusters as data providers to SSB. In this lab you'll add a second data provider using a different host name, just to show how simple it is.

. Click on *Register Kafka Provider* and in the *Add Kafka Provider* window, enter the details for your new data source and click *Save changes*.
. Click on the 3 dots next to the `Kafka` folder name, then click `New Kafka Data Source`. In the pop-up window enter the details for your new data source, as provided below.
+
[source,yaml]
----
Expand All @@ -55,17 +57,21 @@ Connection protocol: PLAINTEXT
----
+
image::images/ssb/add-kafka-provider.png[width=400]
> **_TIP:_**
>
> If you are unsure regarding the format of the brokers list, you can allways reference the existing `Local Kafka` source, or ask your workmates and Cloudera instructors for help 😊

. Finally, click *Validate* (on the bottom left) and *Save changes* (on the bottom right) to create your data source.


[[lab_2, Lab 2]]
== Lab 2 - Create a Table for a topic with JSON messages

Now you can _map_ the `iot_enriched` topic to a _table_ in SQL Stream Builder.
_Tables_ in SSB are a way to associate a Kafka topic with a schema so that you can use it in your SQL queries.

. To create your first Table, click on *Console* (on the left bar), enter a name for your job (e.g. "my_first_job") and click on the *Create Job* button.
. On the *Virtual Tables* pane on the left, click *Add Table > Apache Kafka*.
+
image::images/ssb/add-table.png[width=800]
. To create your first Table you first need to create a Job. Click on the 3 dots next to the `Jobs` folder and then click `New Job`. Enter a name for your job (e.g. "my_first_job") and click on the *Create* button.
. Click on the 3 dots next to the `Virtual Tables` folder and then click `New Kafka Table`.

. On the *Kafka Table* window, enter the following information:
+
Expand All @@ -77,15 +83,13 @@ Data Format: JSON
Topic Name: iot_enriched
----
+
image::images/ssb/kafka-source.png[width=400]
image::images/ssb/ssb-new-kafka-table.png[width=800]

. Ensure the *Schema Definition* tab is selected. Click *Detect Schema* at the bottom of the window.
SSB will take a sample of the data flowing through the topic and will infer the schema used to parse the content.
Alternatively you could also specify the schema in this tab.
+
image::images/ssb/detect-schema.png[width=800]
Click *OK* when it's done, to acknowledge the "Schema Detection Complete" message.

. Click *OK* to acknowledge the "Schema Detection Complete" message.
. Whenever you need to manipulate the source data to fix, cleanse or convert some values, you can define transformations for the table.
Transformations are defined in Javascript code.
+
Expand Down Expand Up @@ -138,7 +142,7 @@ image::images/ssb/source-properties.png[width=400]
NOTE: Setting the *Consumer Group* properties for a virtual table will allow SSB to also store offsets in Kafka, in addition to storing offsets in the job state, which is the default.

. Click *Create and Review* to complete the table creation. On the *Review* window, click *Keep*.
. Let's query the newly created table to ensure things are working correctly. Enter the following query on the SQL editor are (top-right in the Console screen):
. Let's query the newly created table to ensure things are working correctly. Go to the job you've created (in this example this is "my_first_job"), and to on the top window enter the following query:
+
[source,sql]
----
Expand All @@ -157,7 +161,7 @@ FROM
+
NOTE: The first query execution usually takes a bit longer, since SSB has to start the Job Manager that will handle the job execution.
+
image::images/ssb/first-query.png[width=800]
image::images/ssb/ssb-job-running.png[width=800]


. Click *Stop* to stop the job and release all the cluster resources used by the query.
Expand Down Expand Up @@ -195,9 +199,7 @@ Evolve: checked
+
image::images/ssb/schema-registy-iot-enriched.png[width=800]

. Back on the SQL Stream Builder page, click on *Data Providers* (on the left bar) *> Catalogs > (+) Register Catalog*.
+
image::images/ssb/add-catalog-sr.png[width=800]
. Back on the SQL Stream Builder page, click on the `Data Sources` folder > 3 dots next to the `Catalog` folder > (+) New Catalog.

. In the *Catalog* dialog box, enter the following details:
+
Expand All @@ -210,17 +212,17 @@ Schema Registry URL: http://<CLUSTER_HOSTNAME>:7788/api/v1
Enable TLS: No
----

. Click on the *Add Filter* button and enter the following configuration for the filter:
. On the same window under *Filters* enter the following configuration for the filter:
+
[source,yaml]
----
Database Filter: .*
Table Filter: iot.*
----

. Click on the plus sign besides the filter details to register the filter:
+
image::images/ssb/add-filter.png[width=400]
NOTE: Make sure to write `.\*` in the "Database Filter" field, and not `*`, otherwise you'll get a validation error later.

. **IMPORTANT** Click on the plus sign **(+)** besides the filter details to save the filter.

. Click on *Validate*. If the configuration is correct you should see the message "Provider is valid".
Hover your mouse over the message and you'll see the number of tables (schemas) that matched your filter.
Expand All @@ -229,11 +231,11 @@ image::images/ssb/add-sr-catalog.png[width=400]

. Click *Create* to complete the catalog registration.

. On the *Console* screen you should see now the list of tables that were imported from Schema Registry.
. Under `External Resources/Virtual Tables/sr/default_database/` you should see the list of tables that were imported from the Schema Registry.
+
image::images/ssb/sr-tables.png[width=300]
image::images/ssb/ssb-iot-enriched-avro.png[width=500]

. Query the imported table to ensure it is working correctly.
. Use your created job or a new job to Query the imported table and ensure it is working correctly.
+
Clear the contents of the SQL editor and type the following query:
+
Expand Down Expand Up @@ -276,11 +278,11 @@ Streams Messaging Manager Web UI*).
... Availability: `Low`
... Cleanup Policy: `delete`

. On the SSB UI, click *New Job* at the top of the *Console* screen.
. On the SSB UI, create a new job by clicking on the 3 dots next to the `Jobs` folder, then on `New Job`.

. On the *Create New Job* dialog box, enter `Sensor6Stats` for the *Job Name* and click *Create Job*.
. On the *Create New Job* dialog box, enter `Sensor6Stats` for the *Job Name* and click *Create*.

. In the SQL editor type the query shown below.
. In the SQL editor type the query shown below, *but do not* execute yet.
+
This query will compute aggregates over 30-seconds windows that slide forward every second. For a specific sensor value in the record (`sensor_6`) it computes the following aggregations for each window:
+
Expand Down Expand Up @@ -337,6 +339,30 @@ image::images/ssb/template-table-edited.png[width=400]

. Click *Execute* and the table will be created.

. An additional way to create the table is by running this query:

[source,sql]
----
CREATE TABLE `ssb`.`ssb_default`.`sensor6stats` (
`device_id` BIGINT,
`windowEnd` TIMESTAMP(3) NOT NULL,
`sensorCount` BIGINT NOT NULL,
`sensorSum` BIGINT,
`sensorAverage` FLOAT,
`sensorMin` BIGINT,
`sensorMax` BIGINT,
`sensorGreaterThan60` INT NOT NULL
) WITH (
'connector' = 'kafka: edge2ai-kafka', -- Specify what connector to use, for Local Kafka it must be 'kafka: Local Kafka'.
'format' = 'json', -- Data format
'scan.startup.mode' = 'group-offsets', -- Startup mode for Kafka consumer, valid values are 'earliest-offset', 'latest-offset', 'group-offsets', 'timestamp' and 'specific-offsets'"
'topic' = 'sensor6stats', -- To read data from when the table is used as source. It also supports topic list for source by separating topic by semicolon.
'properties.group.id' = 'sensor6stats-group-id',
'properties.auto.offset.reset' = 'latest'
);

----

. Type the original query into the editor again and press *Execute* to run it.

. At the bottom of the screen you will see the log messages generated by your query execution.
Expand All @@ -349,9 +375,6 @@ Note that the data displayed on the screen is only a sample of the data returned
+
image::images/ssb/sql-aggr-results.png[width=800]
+
TIP: If you need more screen space to examine the query results, you can hide the tables pane by clicking on the editor option shown below:
+
image::images/ssb/hide-tables.png[width=600]

. Check the job execution details and logs by clicking on *SQL Jobs* (on the left bar). Explore the options on this screen:
+
Expand Down Expand Up @@ -401,9 +424,9 @@ In this lab you'll create and query Materialized Views (MV).

You will define MVs on top of the query you created in the previous lab. Make sure that query is running before executing the steps below.

. On the *SQL Jobs* screen, verify that the `Sensor6Stats` job is running. Select the job and click on the *Edit in Console View* button.
. On the *SQL Jobs* screen, verify that the `Sensor6Stats` job is running. Select the job
+
image::images/ssb/edit-job.png[width=800]
image::images/ssb/ssb_job_status.png[width=800]

. In order to add Materialized Views to a query the job needs to be stopped.
On the job page, click the *Stop* button to pause the job.
Expand All @@ -419,19 +442,19 @@ Primary Key: device_id
Retention: 300
----
+
image::images/ssb/mv-config1.png[width=500]
image::images/ssb/mv1.png[width=500]

. To create a MV you need to have an API Key.
The API key is the information given to clients so that they can access the MVs.
If you have multiple MVs and want them to be accessed by different clients you can have multiple API keys to control access to the different MVs.
+
If you have already created an API Key in SSB you can select it from the drop-down list.
Otherwise, create one on the spot by clicking on the "plus" button shown above.
Otherwise, create one on the spot by clicking on the "New Key" button shown above.
Use `ssb-lab` as the Key Name.
+
Once the API key is created, select it for your MV.

. Click *Add New Query* to create a new MV.
. Click *New Endpoint* to create a new MV.
You will create a view that shows all the devices for which `sensor6` has had at least 1 reading above 60 in the last 300-seconds (MV window size).
+
For this, enter the following parameters in the MV Query Configuration page:
Expand All @@ -441,14 +464,15 @@ For this, enter the following parameters in the MV Query Configuration page:
URL Pattern: above60
Description: All devices with a sensor6 reading greater than 60
Query Builder: <click "Select All" to add all columns>
Filters: sensorGreatThan60 greater 0
Filters: <click "+ Rule" to configure filter> sensorGreatThan60 greater 0
----
+
image::images/ssb/mv-config2.png[width=500]
image::images/ssb/mvconfig1.png[width=500]
+
image::images/ssb/mv-config2b.png[width=500]
image::images/ssb/mvconfig2.png[width=500]

. Click *Apply and Save Job*.
. Click *Create*.
. Click *Save*.

. Close the *Materialized Views* tab and click on *Execute* to start the job again.

Expand All @@ -470,9 +494,9 @@ In this section you will create a new MV that allows filtering by specifying a r

. First, stop the job again so that you can add another MV.

. Click on the *Materialized Views* button and then on *Add New Query* to create a new MV.
. Click on the *Materialized Views* button and then on *New endpoint* to create a new MV.
+
Enter the following property values and click *Apply and Save Job*.
Enter the following property values and click *Save*.
+
[source,yaml]
----
Expand All @@ -485,11 +509,11 @@ Filters: sensorGreatThan60 greater 0
sensorAverage less or equal {upperTemp}
----
+
image::images/ssb/mv-config3.png[width=500]
image::images/ssb/mvconfig3.png[width=600]

. You will notice that the new URL for this MV has placeholders for the `{lowerTemp}` and `{upperTemp}` parameters:
+
image::images/ssb/mv-url-parameters.png[width=500]
image::images/ssb/mvlist.png[width=500]

. Close the *Materialized View* tab and execute the job again.

Expand All @@ -511,4 +535,3 @@ image::images/ssb/mv-parameters.png[width=400]
You have now taken data from one topic, calculated aggregated results and written these to another topic.
In order to validate that this was successful you have selected the result with an independent select query.
Finally, you created Materialized Views for one of your jobs and queried those views through their REST endpoints.