Creating a Query Definition for the SQL Connector (2024)

Table of Contents
Overview Getting Started The Query Definition JSON Model Connector Rules Adding Signals, Conditions and Scalars JSON Tips Performance Tips Example SQL Tables Simple Query Definitions Simple Signal (Example 1 - Volume Signal) Simple String Signal (Example 2 - ID Signal) Simple Condition (Example 3 - Run Capsules) Simple Scalar (Example 4) Additional Properties (Example 5) Connector Rule: The First Column Must Have a Name SELECT * Condition with Capsule Properties (Example 6 - Run Capsules with Additional Properties) Using Variables A Signal per Measurement (Example 7) Many Signals per Measurement (Example 8) Extracting Description and UOM from Other Tables (Example 9) Query Definition with Multiple Variables (Signal per Run,Measurement) Many Variables Testing your query definition Test Mode Testing the sample / capsule retrieval queries during indexing More Inspiration Shifting the signal key or condition start in time (Example 12) Using SQL IF to Determine Linear vs Step Interpolation (Example 13) Using SQL SUBSTRING to Extract UOM from Column Name (Example 14) Converting (casting) data types in the SQL query Ambiguous column names when using JOIN or schemas Oracle Tip Regarding Case Sensitivity Columns Named with Spaces, Reserved Words, Numbers, Case Sensitivity Columns Names with Spaces, Reserved Words, and Numbers in Databricks Columns Names with Spaces, Reserved Words, and Numbers in Denodo Quotes CONDITION Query Definition types with a calculated End Time Column

Overview

The SQL Connector accesses data in SQL tables as signals, conditions, and scalars. By adding query definitions to the SQL connector configuration, thousands of signals, conditions, and scalars can be created at once. In this article, the format of the query definition is explained through a series of examples.

After a connection is established with the SQL database and a query definition is provided, all the signals, conditions, and scalars found are indexed by Seeq. The SQL queries that define the signal, condition or scalar are run against the database during indexing. Later when a signal or condition is trended in Seeq, queries are also run against the database to access the signal samples or condition capsules.

Getting Started

Creating a Query Definition for the SQL Connector (1)

You don't need to read this entire article to get started. Instead, read through the 1st example, try something simple, and then proceed on to the other examples as you increase the sophistication of your SQL integration.

The Query Definition JSON Model

QueryDefinition Configuration

More information on configuring these values can be found in Creating a Query Definition for the SQL Connector

Property Name

Default Value

Data Type

Description

Name

String

The name of the query definition. This value will be used as part of the Data ID. Changing this value after analyses have been created will break those worksheets/topics.

Type

Enum

  • SIGNAL

  • CONDITION

  • SCALAR

Defines whether this query will result in a set of Signals, Conditions, or Scalars.

Sql

String

The base SQL query that will be used for this query.

Enabled

Boolean

If true, the query is used to discover new data items. If false, no data will be indexed from this query.

EpochTimeUnit

null

  • SECONDS

  • MILLISECONDS

  • MICROSECONDS

  • NANOSECONDS

The units of time used in epoch format time columns for this query. Leave this set to “null” for all other time formats.

TestMode

Boolean

If true, every discovered value will attempt to be queried for to determine if the query is valid. This should be set to false once the query has been identified to be working.

Variables

Array[Variable]

Allows for dynamic value replacements in a Query Configuration. See using variables in the SQL connector configuration.

Properties

Array[Property]

Specifies the properties that will be indexed as metadata for the data item.

CapsuleProperties

Array[CapsuleProperty]

Only valid when Type is CONDITION.

The properties that should be associated with each capsule.

TestQueriesDuringSync

true

Boolean

When set to `true`, the queries used to retrieve samples/capsules will be tested when the signals/conditions are indexed. Any problems encountered are logged. This allows query problems to be discovered prior to trending the signal/condition. This should be set to false once the configuration is successful.

InProgressCapsulesEnabled

false

Boolean

Whether Capsules with a NULL end and starting within the interval from Now - Max Duration to Now should be considered in-progress. When this option is true, such Capsules have their end displayed as Now (i.e., the current time) and the interval mentioned is considered uncertain (since the real ends are still unknown). When this option is false, such Capsules are ignored.

RecentCapsuleLookaheadSeconds

Integer

The duration into the future, following the current time, within which the system searches for the latest capsule that has a non-null end time.

Property Configuration

Property Name

Default Value

Data Type

Description

Name

String

The name of the property.

Value

String

The value of the property.

Uom

String

The unit of measure for the property value, or string for unitless property values.

CapsuleProperty Configuration

Property Name

Default Value

Data Type

Description

Name

String

The name of the capsule property.

Value

String

The value of the capsule property.

Column

String

Optional: The column from the result set that the value should be taken.

Uom

String

The unit of measure for the capsule property value, or string for unitless property values.

Variable Configuration

Property Name

Default Value

Data Type

Description

Names

Array[String]

A list of names that represent the individual variable names.

Values

Array[Array[String]]

A list of values that are associated with the names. There must be an equal number of values as there are names. Position is important between the Name and Value arrays.

Sql

String

Optional: The query that will produce the values for the variable names.

Connector Rules

If you are reading this for the first time, the rules will be explained through the examples. They are presented first due to their importance and for future ease of reference.

  • General

    • The first column must be either a supported date/time column type or a whole number numeric type representing an epoch time value. Those column types that include time zone information are marked below with an * below. If using a date/time column type that doesn't include zone information, remember to set the TimeZone field in the configuration file appropriately.For epoch format time, the UTC time zone is assumed and the TimeZone field is ignored.

      • MySql: year, date, datetime, timestamp*

      • MS SQL Server:date, smalldatetime, datetime, datetime2, datetimeoffset*

      • Oracle: date, timestamp, timestamp with time zone*, timestamp with local time zone*

      • Postgres: date, timestamp, timestamp with time zone*

      • Amazon Redshift: date, timestamp, timestamp with time zone*

      • Vertica: date, smalldatetime, datetime, timestamp, timestamp with time zone*

      • SAP HANA: date, timestamp, seconddate

      • Athena: date, timestamp*

      • Snowflake: date, datetime, timestamp, timestamp_ntz, timestamp_ltz*, timestamp_tz*

      • Databricks: timestamp*

  • Signals

    • The first column must be the key (x-axis) and the second column must be the value (y-axis).

    • Seeq does not support multiple samples with the same key.

      • Details: This means Seeq does not support multiple rows with the same value in the key column (the first column). If your query returns multiple samples with the same key, only the last sample received will be used. Even when using "ORDER BY key" in SQL queries, SQL databases do not always return the rows in the same order when there are multiple rows with the same value in the key column. This means that the sample used by Seeq may change every time the data is accessed.

    • These properties must be defined: Name, Interpolation Method, Maximum Interpolation.

    • If the signal is a string signal, the property Value Unit Of Measure must also be defined and set to "string".

  • Conditions

    • The first column must be the capsule start time and the second column must be the capsule end time.

    • These properties must be defined: Name, Maximum Duration.

  • Scalars

    • The first column must be the constant. Only the first row of the query result will be used so the value of interest must also be in the first row.

    • These properties must be defined: Name.

    • If the scalar is a string and a UOM was specified, the UOM will be ignored.

Creating a Query Definition for the SQL Connector (2)

Once signals and conditions have been created using a query definition, don't change the query definition name or the variable names. These both contribute to a unique identifier for the signal/condition. If you change them, the original signal/condition (with the original identifier) will be archived and a new signal/condition (with a new identifier) will be created. If you had calculations, worksheets, etc referring to the signals/conditions with the original identifiers, those will not be automatically updated to point to the new signals/conditions with the new identifiers.

Adding Signals, Conditions and Scalars

After a connection is established, signals, conditions and scalars can be added by adding Query Definitions.

Below is an excerpt of the SQL Connector configuration file showing where the Query Definitions should go. In this example, signals and conditions are being defined for the Oracle connection. Note that the Query Definitions are located inside the braces associated with the Oracle connection.

Configuration File Excerpt

CODE

{ "Version" : "com.seeq.link.connectors.sql2.config.SqlConnectorV2ConfigV1", "Connections" : [ { "Name" : "My Oracle Database", "Id" : "cd3c8788-7fb9-4637-b843-8f1e2c08be55", "Enabled" : true, "Type" : "ORACLE", ... "MaxConcurrentRequests" : 10000, "MaxResultsPerRequest" : null, "QueryDefinitions" : [ { } ] }, { "Name" : "My Postgres Database", "Id" : "cd3c8788-7fb9-4637-b843-8f1e2c081234", "Enabled" : true,

The details of the Query Definition content are explained in the following examples.

JSON Tips

The JSON query definitions are some of Seeq's more complex JSON files. Getting all the JSON formatting just right can be tricky. Here are some tips:

  • JSON doesn't support comments

  • Each field's value must be on a single line without line breaks

  • Check your "[ { , " formatting with a json validator such ashttps://jsonlint.com/

  • Indenting doesn't matter but if done, it is easier to spot problems. There are tools that will apply indenting for you such ashttps://jsonformatter.curiousconcept.com/orhttp://jsonprettyprint.com/.
    If using Notepad++ to edit json files, the addin "jstool" does formatting and has a "json viewer": the Jslint add-in is a Javascript validator, but if you set the file language type to JavaScript it will validate json (json is, for practical purposes, mainly a JavaScript subset)

Performance Tips

Retrieving partitioned data

When using SQL Connector to connect to warehouse or lakehouse systems, the data is often partitioned. In addition to partitioning by facility or tag structure, which can be achieved through the use of variables (as shown above), time series data is often partitioned by timestamp components as well. The SQL Connector is able to pull out parts of the start and end timestamp into the partitions required to isolate the query to just the partitions of interest. This can be done by inserting ${start:...} and ${end:...} keywords into the "Sql" property of a Query Definition of type SIGNAL and CONDITION.

Configuration File Excerpt

CODE

"QueryDefinitions" : [ { "Name" : "MyVolumeSignal", "Type" : "SIGNAL", "Sql": "SELECT Time, Value FROM dbo.SensorDataPartionedView WHERE Year >= ${start:yyyy} AND Year <= ${end:yyyy} AND Month >= ${start:month} AND Month <= ${end:month} AND Day >= ${start:day} AND Day <= ${end:day}",

When using ${start:<pattern>} and ${end:<pattern>} keywords, the pattern may be any valid sequence of symbols according to https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html, for example {start:yyyy-MM-dd HH:mm:ss.SSSSSS}.

Additionally from the pattern format mentioned above, Seeq offers the following additional patterns:

  • ${start:month} = the month from the start of the request (will be set to 01 when the query crosses years)

  • ${start:day} = the day from the start of the request (to be set to 01 when the query crosses months)

  • ${end:month} = the month from the end of the request (will be set to 12 when the query crosses years)

  • ${end:day} = the day from the end of the request (to be set to 31 when the query crosses months)

Creating a Query Definition for the SQL Connector (3)

The additional patterns mentioned above are specific to Seeq and are not standard Java DateTimeFormatter patterns.

To explain why the additional patterns above are needed, let’s consider a trivial example with one table, representing a signal named “volume”.

CODE

Table: volume=============timestamp, value y m d'2016-08-16 10:00:00', 2.1 2016 8 16'2016-08-17 10:00:00', 5.3 2016 8 17'2016-09-16 10:00:00', 2.7 2016 9 16'2017-01-03 10:00:00', 2.0 2017 1 3

The Sql for unpartitioned access would be: select timestamp, value from volume.

An Sql that will use partitioning by year, month and day may be imagined like select timestamp, value from volume WHERE y>=${start:yyyy} and y<=${end:yyyy} and m>=${start:MM} and m<=${end:MM} and d>=${start:dd} and d<=${end:dd}.

The problem is that if a user makes a query to get all the values between 2016-08-17 00:00:00 and 2016-09-16 23:59:00 , the system will do a query y>=2016 and y<=2016 and m>=8 and m<=9 and d>=17 and d<=16. There is no matching records for d>=17 and d<=16. Because the month boundary has crossed, the day filter should be replaced to be a “pass all filter” (i.e. d>=1 and d<=31). This cannot be obtained by using ${start:dd} and ${end:dd} patterns, therefore we needed to create ${start:day} and ${end:day} (and similarly ${start:month} and ${end:month}) to account for year and month crossings.

Example SQL Tables

The following examples will use these SQL tables. Most of the examples are illustrated using a Postgres datasource but the concepts are the same for all database types. Syntax may vary slightly for other database types.

LabSignals Table (Postgres format)

SQL

CREATE TABLE LabSignals( id varchar(20) not null, run_start timestamp null, run_end timestamp null, volume real null, temperature double precision null, thickness integer null, cost double precision null);INSERT INTO LabSignals (id, run_start, run_end, volume, temperature, thickness, cost ) VALUES ('WO000101', '2016-08-16 10:00:00', '2016-08-16 12:15:00', 2.1, 51.2, 18, 3793.87), ('WO000102', '2016-08-17 10:00:00', '2016-08-17 12:15:00', 3.4, 40.6, 18, 3711.71), ('WO000103', '2016-08-18 10:00:00', '2016-08-18 12:15:00', 13.1, 68.2, 14, 4262.07), ('WO000106', '2016-08-21 10:00:00', '2016-08-21 12:15:00', 19.1, 61.8, 11, 3717.12), ('WO000107', '2016-08-22 10:00:00', '2016-08-22 12:15:00', 10.1, 50.6, 19, 3337.39), ('WO000108', '2016-08-22 10:00:00', '2016-08-22 12:15:00', 14.1, 65.9, 20, 3548.29), ('WO000109', NULL , '2016-08-24 12:15:00', 9.3, 61.5, 15, 1677.20), ('WO000110', '2016-08-24 10:00:00', '2016-08-24 12:15:00', 9.7, 63.5, 12, 1688.20), ('WO000104', '2016-08-19 10:00:00', '2016-08-19 12:15:00', 20.2, 60.6, 11, 2535.59), ('WO000105', '2016-08-20 10:00:00', '2016-08-20 12:15:00', 5.7, 54.1, 10, 3968.47), ('WO000111', '2016-08-25 10:00:00', '2016-08-25 12:15:00', 6.9, 69.4, 13, 1446.76), ('WO000112', '2016-08-26 10:00:00', NULL , 16.1, 65.3, 20, 2697.21), ('WO000113', '2016-08-27 10:00:00', '2016-08-27 12:15:00', 18.1, 59.4, 15, 3377.84), ('WO000116', '2016-08-30 10:00:00', '2016-08-30 12:15:00', 9.1, 29.4, 17, 1438.47), ('WO000117', '2016-08-31 10:00:00', '2016-08-31 12:15:00', 16.5, 63.5, 13, 3173.15), ('WO000118', '2016-09-01 10:00:00', '2016-09-01 12:15:00', 16.1, 54.1, 16, 1082.16), ('WO000119', '2016-09-01 11:00:00', '2016-09-01 12:15:00', NULL, 34.1, 11, 1282.16), ('WO000114', '2016-08-28 10:00:00', '2016-08-28 12:15:00', 19.1, 65.3, 13, 4462.97), ('WO000115', '2016-08-29 10:00:00', '2016-08-29 12:15:00', 17.1, 30.6, 10, 2541.71), ('WO000120', '2016-09-02 10:00:00', '2016-09-02 12:15:00', 10.1, 38.8, 18, 4027.12), ('WO000121', '2016-09-02 11:00:00', '2016-09-02 12:15:00', 11.1, NULL, 21, 3027.12), ('WO000122', '2016-09-03 10:00:00', '2016-09-03 12:15:00', 8.0, 43.5, 15, 1469.46), ('WO000123', '2016-09-04 10:00:00', '2016-09-04 12:15:00', 20.5, 65.3, 16, 1266.40), ('WO000124', '2016-09-05 10:00:00', '2016-09-05 12:15:00', 7.4, 34.1, 18, 4242.43), ('WO000125', '2016-09-06 10:00:00', '2016-09-06 12:15:00', 20.7, 35.9, 20, 1785.23), ('WO000126', '2016-09-06 11:00:00', '2016-09-06 12:15:00', 21.7, 25.9, NULL, 1685.23), ('WO000127', '2016-09-07 10:00:00', '2016-09-07 12:15:00', 11.8, 64.1, 18, 3186.04), ('WO000128', '2016-09-08 10:00:00', '2016-09-08 12:15:00', 10.9, 30.6, 11, 3194.59), ('WO000129', '2016-09-09 10:00:00', '2016-09-09 12:15:00', 14.9, 52.9, 10, 3622.70), ('WO000130', '2016-09-10 10:00:00', '2016-09-10 12:15:00', 16.1, 52.9, 20, 2279.28), ('WO000131', '2016-09-11 10:00:00', '2016-09-11 12:15:00', 6.7, 48.2, 10, 2802.88), ('WO000132', '2016-09-11 11:00:00', '2016-09-11 12:15:00', 7.7, 38.2, 17, NULL), ('WO000133', '2016-09-12 10:00:00', '2016-09-12 12:15:00', 6.3, 47.6, 12, 1543.12), ('WO000134', '2016-09-13 10:00:00', '2016-09-13 12:15:00', 7.2, 53.5, 10, 956.20), ('WO000135', '2016-09-14 10:00:00', '2016-09-14 12:15:00', 5.7, 47.6, 14, 1375.92), ('WO000136', '2016-09-15 10:00:00', '2016-09-15 12:15:00', 20.1, 46.5, 13, 2122.22);
LabSignalMetadata (Postgres format)

SQL

CREATE TABLE LabSignalMetadata( measurement varchar(60) not null, description varchar(60) not null, units varchar(60) not null);INSERT INTO LabSignalMetadata (measurement, description, units) VALUES ('volume', 'How much was produced', 'litre'), ('temperature', 'The temperature during production', 'degc'), ('thickness', 'The thickness of the product', 'mils'), ('cost', 'The production cost', '$');

Simple Query Definitions

Simple Signal (Example 1 - Volume Signal)

SQL signals have sample data that comes from a query such as this:

SELECT myKey, myValue1 FROM myTable

Note that the signal's key (x-axis) is the first column and the signal's value (y-axis) is the second column. This follows the connector rules.

Using the example table LabSignals, we could create a volume signal with the query:

SELECT run_start, volume FROM LabSignals

To create this signal, add a Query Definition to the configuration file. The minimum query definition looks like:

Query Definition for a simple signal

CODE

 "QueryDefinitions" : [ { "Name" : "MyVolumeSignal", "Type" : "SIGNAL", "Sql" : "SELECT run_start, volume FROM LabSignals", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "Volume", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2day", "Uom" : "string" } ] } ]
  • The first name field,"Name" : "MyVolumeSignal", is the name of the Query Definition and isn't displayed prominently so it doesn't have to be a pretty name.

  • Every signal needs a minimum of 3 properties.

    • The signal name which is the name you'll see in the UI.

    • The interpolation method.

    • The maximum interpolation.

Simple String Signal (Example 2 - ID Signal)

The minimum query definition for a string signal is nearly identical to the previous example except that the Value Unit of Measure property must also be defined and set to "string".

Using the example table LabSignals, we could create an ID signal with the query.

SELECT run_start, id FROM LabSignals

The minimum query definition looks like the following. I chose to make this a step interpolated signal.

Query Definition for a simple signal

CODE

 "QueryDefinitions" : [ { "Name" : "MyIdSignal", "Type" : "SIGNAL", "Sql" : "SELECT run_start, id FROM LabSignals", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "ID", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "step", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2day", "Uom" : "string" }, { "Name" : "Value Unit Of Measure", "Value" : "string", "Uom" : "string" } ] } ]

Simple Condition (Example 3 - Run Capsules)

SQL conditions have capsule data that comes from a query such as

SELECT myStart, myEnd FROM myTable

Note that the start is the first column and the end is the second column. This follows the connector rules.

Using the example table LabSignals, we could create a condition with a capsule for each run using the query

SELECT run_start, run_end FROM LabSignals

To create this condition, add a Query Definition to the configuration file.The minimum query definition looks like:

Query Definition for a simple condition

CODE

 "QueryDefinitions" : [ { "Name" : "MyRunsCondition", "Type" : "CONDITION", "Sql" : "SELECT run_start, run_end FROM LabSignals", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "Runs", "Uom" : "string" }, { "Name" : "Maximum Duration", "Value" : "3h", "Uom" : "string" } ] } ]
  • The first name field,"Name" : "MyRunsCondition", is the name of the Query Definition and isn't displayed prominently so it doesn't have to be a pretty name.

  • Every condition needs a minimum of 2 properties.

    • The condition name.

      • The Value property on line 9 above, is the name you'll see in the UI.

    • The maximum duration.

Simple Scalar (Example 4)

SQL scalars have a constant value that comes from a query such as

SELECT myConstant FROM myTable

Note that the constant value is found in the first column and first row of the result. This follows the connector rules.

Using the example table LabSignals, we could create a scalar using the query

SELECT volume FROM LabSignals WHERE id='WO000113'

To create this scalar, add a Query Definition to the configuration file.The minimum query definition looks like:

Query Definition for a simple scalar

CODE

 "QueryDefinitions" : [ { "Name" : "MyConstantDefinition", "Type" : "SCALAR", "Sql" : "SELECT volume FROM LabSignals WHERE id='WO000113'", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "MyConstant", "Uom" : "string" }, { "Name" : "Unit Of Measure", "Value" : "litre", "Uom" : "string" } ] } ]
  • The first name field,"Name" : "MyConstantDefinition", is the name of the Query Definition and isn't displayed prominently so it doesn't have to be a pretty name.

  • Every scalar needs a minimum of 1 property.

    • The scalar name which is the name you'll see in the UI.

  • Unit of Measure (or Value Unit of Measure) is an optional property. If the scalar came from a string SQL column type or from adate/time SQL column type, the uom will be ignored.

Scalar Timestamps

SQL scalars coming from a date/time column type will be converted to Seeq timestamps which are Unix epoch timestamps with units nanoseconds. This makes them easy to use in calculations.

Additional Properties (Example 5)

There can be additional properties beyond the required properties. Common Seeq properties are:

  • Description (signals and conditions)

  • Value Unit Of Measure (signals only)

You can also make up your own properties such as the "SomeOtherProperty" shown below. In this case, I want the value to be treated as a number rather than a string and so the Value field has no quotes around it.

Here is the signal query definition with a few more properties defined:

Query definition for a simple signal with more properties

CODE

 "QueryDefinitions" : [ { "Name" : "MyVolumeSignal", "Type" : "SIGNAL", "Sql" : "SELECT run_start, volume FROM LabSignals", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "Volume", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2day", "Uom" : "string" }, { "Name" : "Description", "Value" : "The volume of product produced during a run", "Uom" : "string" }, { "Name" : "Value Unit Of Measure", "Value" : "litre", "Uom" : "string" }, { "Name" : "SomeOtherProperty", "Value" : 12345.6789, "Uom" : "kg" } ] } ]

This is how this signal will look in Seeq along with the information as shown in the Item Properties panel. Note that there is an "SQL" property created automatically. This is useful if for checking what SQL query is behind a Seeq signal or condition. In this example, there is just one signal. When you have thousands of SQL signals, this will come in handy.

Creating a Query Definition for the SQL Connector (4)
Creating a Query Definition for the SQL Connector (5)

Connector Rule: The First Column Must Have a Name

One of the SQL Connector's rules is that the first column (signal key or condition start) must have a name. This rule was followed in all the previous examples. The first column is named "run_start".

The reason that a name is required is that Seeq looks at the time range that you are interested in and only requests the portion of the SQL table that is relevant to that time range. If the SQL table is very large, this is much faster than requesting the data from the entire table. Therefore, the query

SELECT run_start, volume FROM LabSignals

is modified by Seeq to be of the form

SELECT run_start, volume FROM LabSignals WHERE run_start> timestamp '2016-08-14 22:32:02.727096955' AND run_start < timestamp '2016-09-15 02:45:40.908914815'

The name of the first column is needed for the WHERE clause.

Here is an example of a Postgres query that doesn't follow this rule:

SELECT run_start + INTERVAL '6 hours', volume FROM LabSignals

But it can be modified to follow the rules by:

  • naming the first column

  • wrapping the query in an extra SELECT statement and naming the table

SELECT * FROM (SELECT run_start +INTERVAL '6 hours' AS myKey, volume FROM LabSignals) AS foo

At which point Seeq has enough information to add the WHERE clauses:

SELECT * FROM (SELECT run_start + INTERVAL '6 hours' AS myKey, volume FROM LabSignals) AS foo WHERE myKey > timestamp '2016-08-14 23:14:04.64716377' AND myKey < timestamp '2016-09-15 01:29:39.499100425' ORDER BY myKey

Scenarios that need this sort of modification include:

SELECT *

In general, using SELECT * is not recommended. The query

SELECT * FROM LabSignals

could successfully produce a signal but it pulled 5 additional columns that were not needed. Transferring extra columns of data takes time which means that it will take longer to load data.

Condition with Capsule Properties (Example 6 - Run Capsules with Additional Properties)

Instead of making signals out of each measurement, let's instead record the measurements as properties of each capsule. To do this, I need to add volume and thickness to my main SQL query. Then in each of the capsule properties, I can reference those additional columns by using the keyword ${columnResult} in the Value field and specifying the column name in the Column field. I also added an additional capsule property that is just set to "red".

Query Definition with Capsule Properties

CODE

 "QueryDefinitions" : [ { "Name" : "MyRunsCondition", "Type" : "CONDITION", "Sql" : "SELECT run_start, run_end, volume, thickness FROM LabSignals", "Enabled" : true, "TestMode" : false, "Properties" : [ { "Name" : "Name", "Value" : "Runs", "Uom" : "string" }, { "Name" : "Maximum Duration", "Value" : "3h", "Uom" : "string" } ], "CapsuleProperties" : [ { "Name" : "volume", "Value" : "${columnResult}", "Column" : "volume", "Uom" : "string" }, { "Name" : "thickness", "Value" : "${columnResult}", "Column" : "thickness", "Uom" : "string" }, { "Name" : "some other property", "Value" : "red", "Uom" : "string" } ] } ]

This query definition looks like this in Seeq. The properties of each capsule can be observed in the Capsule Details pane. By default, only the start is shown but the others can be added using the circled button. The capsules can be sorted by any of the properties. A new condition containing only a subset of the capsules can be created by filtering on the properties. For more information, see the filter() and getProperty() documentation in the formula tool.

Creating a Query Definition for the SQL Connector (6)

Using Variables

A Signal per Measurement (Example 7)

The simple signal query definition above creates just one signal. But that table has four signals of similar form.

SELECT run_start, volume FROM LabSignals

SELECT run_start, temperature FROM LabSignals

SELECT run_start, thickness FROM LabSignals

SELECT run_start, cost FROM LabSignals

All N signals can be created with one query definition by using a variable such as ${myMeasurement}.

SELECT run_start,${myMeasurement} FROM LabSignals

where${myMeasurement} = volume, temperature, thickness, cost

Signal Query Definition Using a Variable

CODE

 "QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, ${myMeasurement} FROM LabSignals", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myMeasurement" ], "Values" : [ [ "volume" ], [ "temperature" ], ["thickness"], ["cost"] ] } ], "Properties" : [ { "Name" : "Name", "Value" : "Measurement of ${myMeasurement}", "Uom" : "string" }, { "Name" : "Description", "Value" : "This is a measurement of the ${myMeasurement} of our product", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2d", "Uom" : "string" } ] } ]

Note the addition of a variable to the query definition. In this case, it has been hardcoded to have 4 values which means that this query definition will create 4 signals. The first signal is created from the above query definition but with every instance of ${myMeasurements} replaced by volume. The 2nd signal is created using temperature. And so on.

Signal Name

Description

Samples come from:

1

Measurement of volume

This is a measurement of the volume of our product

SELECT run_start, volume FROM LabSignals

2

Measurement of temperature

This is a measurement of the temperature of our product

SELECT run_start, temperature FROM LabSignals

3

Measurement of thickness

This is a measurement of the thickness of our product

SELECT run_start, thickness FROM LabSignals

4

Measurement of cost

This is a measurement of the cost of our product

SELECT run_start, cost FROM LabSignals

Many Signals per Measurement (Example 8)

If there were 200 signals of this form, hardcoding the values for ${myMeasurement} would be tedious. Instead, the variable values can be defined by another SQL query.

Query Definition Using a Variable Defined by SQL

CODE

 "QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, ${myMeasurement} FROM LabSignals", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myMeasurement" ], "Values" : [ [ "${result}" ] ], "Sql" : "select column_name from information_schema.columns where table_name='labsignals' and column_name!='run_start' and column_name!='run_end' and column_name!='id'" } ], "Properties" : [ { "Name" : "Name", "Value" : "Measurement of ${myMeasurement}", "Uom" : "string" }, { "Name" : "Description", "Value" : "This is a measurement of the ${myMeasurement} of our product", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2d", "Uom" : "string" } ] } ]
  • Using ${result} for the Values field indicates that the values are to come from the SQL query response defined on the next line.

  • This query returns every column name in the table excluding run_start and run_end. This syntax is Postgres syntax. The syntax for your database may be different.

The results of this query definition look like this in Seeq.

Creating a Query Definition for the SQL Connector (7)

Extracting Description and UOM from Other Tables (Example 9)

Along with the example SQL table LabSignals, there is another table LabSignalMetadata that contains description and unit of measure information. Here is a more advanced query definition that uses additional SQL queries to utilize that description and unit of measure information for each signal.

Query Definition using information from multiple tables

CODE

 "QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, ${myMeasurement} FROM LabSignals", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myMeasurement" ], "Values" : [ [ "${result}" ] ], "Sql" : "select column_name from information_schema.columns where table_name='labsignals' and column_name!='run_start' and column_name!='run_end' and column_name!='id'" } ], "Properties" : [ { "Name" : "Name", "Value" : "Lab ${myMeasurement}", "Uom" : "string" }, { "Name" : "Description", "Value" : "${result}", "Sql" : "SELECT description FROM LabSignalMetadata WHERE measurement='${myMeasurement}'", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2d", "Uom" : "string" }, { "Name" : "Value Unit Of Measure", "Value" : "${result}", "Sql" : "SELECT units FROM LabSignalMetadata WHERE measurement='${myMeasurement}'", "Uom" : "string" } ] } ]

The results of this query definition look like this in Seeq. Notice that each signal now has a meaningful description and unit of measure that were pulled from the LabSignalMetadata table.

Creating a Query Definition for the SQL Connector (8)

Query Definition with Multiple Variables (Signal per Run,Measurement)

This example will use the SQL table fom (figure of merit).

fom Table (Postgres format)

CODE

CREATE TABLE fom( fom_time timestamp null, fom_value double precision null, run integer null, fom_type varchar(50) null);INSERT INTO fom (fom_time, fom_value, run, fom_type) VALUES ('2016-08-16 10:00:00', 2.1, 2, 'thickness' ), ('2016-08-17 10:00:00', 3.4, 1, 'volume' ), ('2016-08-18 10:00:00',13.1, 1, 'temperature'), ('2016-08-21 10:00:00',19.1, 4, 'thickness' ), ('2016-08-22 10:00:00',10.1, 1, 'volume' ), ('2016-08-22 10:00:00',14.1, 1, 'temperature'), ('2016-08-24 10:00:00', 9.7, 5, 'thickness' ), ('2016-08-19 10:00:00',20.2, 2, 'volume' ), ('2016-08-20 10:00:00', 5.7, 2, 'temperature'), ('2016-08-25 10:00:00', 6.9, 1, 'thickness' ), ('2016-08-26 10:00:00',16.1, 1, 'volume' ), ('2016-08-27 10:00:00',18.1, 1, 'temperature'), ('2016-08-30 10:00:00', 9.1, 4, 'thickness' ), ('2016-08-31 10:00:00',16.5, 5, 'volume' ), ('2016-09-01 10:00:00',16.1, 4, 'temperature'), ('2016-08-28 10:00:00',19.1, 5, 'thickness' ), ('2016-08-29 10:00:00',17.1, 5, 'volume' ), ('2016-09-02 10:00:00',10.1, 4, 'temperature'), ('2016-09-03 10:00:00', 8.0, 4, 'thickness' ), ('2016-09-04 10:00:00',20.5, 1, 'volume' ), ('2016-09-05 10:00:00', 7.4, 2, 'temperature'), ('2016-09-06 10:00:00',20.7, 4, 'thickness' ), ('2016-09-07 10:00:00',11.8, 1, 'volume' ), ('2016-09-08 10:00:00',10.9, 4, 'temperature'), ('2016-09-09 10:00:00',14.9, 5, 'thickness' ), ('2016-09-10 10:00:00',16.1, 2, 'volume' ), ('2016-09-11 10:00:00', 6.7, 4, 'temperature'), ('2016-09-12 10:00:00', 6.3, 4, 'thickness' ), ('2016-09-13 10:00:00', 7.2, 2, 'volume' ), ('2016-09-14 10:00:00', 5.7, 1, 'temperature'), ('2016-09-15 10:00:00',20.1, 3, 'thickness' );

This table has signals for thickness, volume, and temperature over runs 1-5. The queries are of the form:

SELECT fom_time, fom_value FROM fom WHERE run=1 AND fom_type='thickness'

SELECT fom_time, fom_value FROM fom WHERE run=1 AND fom_type='volume'

SELECT fom_time, fom_value FROM fom WHERE run=2 AND fom_type='thickness'

etc.

All of these signals can be created using one query definition that has two variables.

SELECT fom_time, fom_value FROM fom WHERE run=${run} AND fom_type='${fom}'

where ${run} = 1,2,3,4,5 and ${fom} = thickness, volume, temperature.

Signal for every possible combination of run and fom_type (Example 10)

${run} has 5 values and ${fom} has 3 values. The following query definition will produce a signal for every possible combination of ${run} and ${fom}which is 5 x 3 = 15 signals.

Query Definition using Every Possible Combination of Multiple Variables

CODE

 "QueryDefinitions" : [ { "Name" : "MyFomUseCase", "Type" : "SIGNAL", "Sql" : "SELECT fom_time, fom_value FROM fom WHERE run = ${myRun} AND fom_type='${myType}'", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myRun" ], "Values" : [ [ "${result}" ] ], "Sql" : "select distinct run from fom" }, { "Names" : [ "myType" ], "Values" : [ [ "${result}" ] ], "Sql" : "select distinct fom_type from fom" } ], "Properties" : [ { "Name" : "Name", "Value" : "Run ${myRun} ${myType}", "Sql" : null, "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Sql" : null, "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "24d", "Sql" : null, "Uom" : "string" } ] } ]
Creating a Query Definition for the SQL Connector (9)

Signal for every existing combination of run and fom_type (Example 11)

If you examine the SQL table fom, you'll notice that not every run measures every figure of merit. For example, there is no temperature measurement for run 5. The previous example produces a signal called Run 5 Temperature but it will have no samples. This may or may not be desirable depending on the use case. If this is not desirable, one can instead write a query definition that creates a signal for every existing combination of run and fom_type rather than every possible combination. Note that the previous example had two variable definitions. This example has combined them into one.

Query Definition using Every Existing Combination of Multiple Variables

CODE

 "QueryDefinitions" : [ { "Name" : "MyFomUseCase", "Type" : "SIGNAL", "Sql" : "SELECT fom_time, fom_value FROM fom WHERE run = ${myRun} AND fom_type='${myType}'", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myRun", "myType" ], "Values" : [ [ "${result}" ] ], "Sql" : "select distinct run, fom_type from fom" } ], "Properties" : [ { "Name" : "Name", "Value" : "Run ${myRun} ${myType}", "Sql" : null, "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Sql" : null, "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "24d", "Sql" : null, "Uom" : "string" } ] } ]

This query definition only produces 11 signals:

Creating a Query Definition for the SQL Connector (10)

Many Variables

There is no limit to the number of variables that can be used. If a query definition had 5 variable definitions:

variable definition 1: 5 values

variable definition 2: 2 values

variable definition 3: 10 values

variable definition 4: 20 values

variable definition 5: 3 values

then it would result in 5 x 2 x 10 x 20 x 3 = 6000 signals.

Testing your query definition

Test Mode

The query definition contains a field called "TestMode" which is false by default.When test mode is enabled, the signals and conditions are not accessible by Seeq. Instead, the query for each signal/condition is printed to thelog file log/jvm-link/sql-v2-test.log which can be found in the Seeq data folder.

Test mode is useful for checking the query definition before bringing the signals and conditions into Seeq. For example, if you expect 100 signals but sql-v2-test.log shows 10,000 signals, you'll want to debug your query definition before bringing those signals or conditions into Seeq.

The log entries for the previous example would look like:

sql-v2-test.log

TEXT

2018-05-14T13:40:23.179-07:00[America/Los_Angeles] Query Definition: MyFomUseCase SELECT fom_time, fom_value FROM fom WHERE run = ${myRun} AND fom_type='${myType}' expands to 11 signal(s): SIGNAL NAME QUERY Run 1 volume SELECT fom_time, fom_value FROM fom WHERE run = 1 AND fom_type='volume' Run 5 volume SELECT fom_time, fom_value FROM fom WHERE run = 5 AND fom_type='volume' Run 2 temperature SELECT fom_time, fom_value FROM fom WHERE run = 2 AND fom_type='temperature' Run 2 volume SELECT fom_time, fom_value FROM fom WHERE run = 2 AND fom_type='volume' Run 2 thickness SELECT fom_time, fom_value FROM fom WHERE run = 2 AND fom_type='thickness' Run 1 thickness SELECT fom_time, fom_value FROM fom WHERE run = 1 AND fom_type='thickness' Run 4 temperature SELECT fom_time, fom_value FROM fom WHERE run = 4 AND fom_type='temperature' Run 1 temperature SELECT fom_time, fom_value FROM fom WHERE run = 1 AND fom_type='temperature' Run 5 thickness SELECT fom_time, fom_value FROM fom WHERE run = 5 AND fom_type='thickness' Run 3 thickness SELECT fom_time, fom_value FROM fom WHERE run = 3 AND fom_type='thickness' Run 4 thickness SELECT fom_time, fom_value FROM fom WHERE run = 4 AND fom_type='thickness'

Testing the sample / capsule retrieval queries during indexing

When a connection is established with the SQL database, all the signals, conditions, and scalars found are indexed by Seeq. In addition, the queries used to retrieve signal samples and condition capsules are tested for correctness. If a problem is found, it is logged as a warning in the log file. You can search the log file for these warnings and proactively correct the problems. This will prevent an error later when someone tries to trend the signal or condition. The warnings look something like this.

Example warning

CODE

WARN 2019-04-23T12:39:04.164-07:00 [Metadata sync for JVM Agent: SQL: Postgres V2 UTC: 3ec583d2-3dbf-4e57-bc24-b4a2e8d214bb] com.seeq.link.connectors.sql2.SqlConnectionV2 - SIGNAL 'Lab volume' from Query Definition 'LabSignals' with query 'SELECT run_start, volume FROM MyFakeTable' will fail to retrieve samples. Please correct the Query Definition. Error: java.sql.SQLException: Error executing query 'SELECT WorkStart, volume FROM MyFakeTable LIMIT 0'. ERROR: relation "myfaketable" does not existWARN 2019-04-23T12:57:26.787-07:00 [Metadata sync for JVM Agent: SQL: Postgres V2 UTC: 3ec583d2-3dbf-4e57-bc24-b4a2e8d214bb] com.seeq.link.connectors.sql2.SqlConnectionV2 - CONDITION 'Runs' from Query Definition 'MyRunsCondition' with query 'SELECT run_start, run_end FROM MyFakeTable' will fail to retrieve capsules. Please correct the Query Definition. Error: java.sql.SQLException: Error executing query 'SELECT workstart, workend FROM MyFakeTable LIMIT 0'. ERROR: relation "myfaketable" does not exist


To disable this testing, set the query definition field "TestQueriesDuringSync" to false. True is the default.

More Inspiration

Because SQL is so flexible, your query definitions can get quite creative. Here are some ideas for inspiration.

Shifting the signal key or condition start in time (Example 12)

Perhaps it is desired that the signal key or condition start be shifted forward in time by 6 hours.

The signal in Example 8 used this query:

SELECT run_start, ${myMeasurement} FROM LabSignals

To shift that signal forward in time by 6 hours, just adjust the query like so.

MySql

SELECT * FROM (SELECT date_add(run_start, interval 6 hour) as run_start_shifted,${myMeasurement} FROM LabSignals) as LabSignalsShifted

MS SQL Server

SELECT * FROM(SELECT dateadd(hour,6,run_start) as run_start_shifted,${myMeasurement} FROM LabSignals) asLabSignalsShifted

Postgres/Redshift

SELECT * FROM(SELECT run_start + interval '6 hours' as run_start_shifted,${myMeasurement} FROM LabSignals)asLabSignalsShifted

Oracle

submit your solution

See the rule "First column must have a name" for more details on the extra SELECT * FROM (). This rule applies to signals and conditions.

Using SQL IF to Determine Linear vs Step Interpolation (Example 13)

Using the FOM example from above, I can specify step interpolation for Run 1 and linear interpolation for all the other runs by using an SQL IF statement. This example is in MS SQL Server format.

SQL IF Statement (MS SQL Server)

CODE

if ${run}=1 begin select 'step' end else begin select 'linear' end


JSON doesn't allow line breaks in the SQL statements so the query definition looks like:

Query definition with SQL IF statement

CODE

 "QueryDefinitions" : [ { "Name" : "MyFomUseCase", "Type" : "SIGNAL", "Sql" : "SELECT fom_time, fom_value FROM fom WHERE run = ${myRun} AND fom_type='${myType}'", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myRun", "myType" ], "Values" : [ [ "${result}" ] ], "Sql" : "select distinct run, fom_type from fom" } ], "Properties" : [ { "Name" : "Name", "Value" : "Run ${myRun} ${myType}", "Sql" : null, "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "${result}", "Sql" : "if ${myRun}=1 begin select 'step' end else begin select 'linear' end, "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "24d", "Sql" : null, "Uom" : "string" } ] } ]

In this example, IF was used to determine the Interpolation Method but this concept could be applied to any item or capsule property.

Using SQL SUBSTRING to Extract UOM from Column Name (Example 14)

Perhaps my LabSignals table from earlier looked like this instead where the unit of measure was embedded in the column name.

LabSignals table with UOM in column name (MS SQL Server format)

CODE

CREATE TABLE LabSignals( run_start datetime null, run_end datetime null, volume_litre decimal(3,1) null, temperature_degf real null, thickness_mil int null, cost_$ decimal(6,2) null)INSERT INTO dbo.LabSignals (run_start, run_end, volume_litre, temperature_degf, thickness_mil, cost_$ ) VALUES ('2016-08-16 10:00:00','2016-08-16 12:15:00', 2.1, 51.2, 18, 3793.87)

The signals can be given those units by using SQL SUBSTRING (MS SQL Server). The query definition from Example 8 then becomes:

Query definition using SUBSTRING

CODE

"QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, ${myMeasurement} FROM LabSignals", "Enabled" : true, "TestMode" : false, "Variables" : [ { "Names" : [ "myMeasurement" ], "Values" : [ [ "${result}" ] ], "Sql" : "select column_name from information_schema.columns where table_name='labsignals' and column_name!='run_start' and column_name!='run_end'" } ], "Properties" : [ { "Name" : "Name", "Value" : "Measurement of ${myMeasurement}", "Uom" : "string" }, { "Name" : "Interpolation Method", "Value" : "linear", "Uom" : "string" }, { "Name" : "Maximum Interpolation", "Value" : "2d", "Uom" : "string" }, { "Name" : "Value Unit Of Measure", "Value" : "${result}", "Sql" : "SELECT SUBSTRING('${column}', CHARINDEX('_','${column}')+1, LEN('${column}'))", "Uom" : "string" } ]} ]

In this example, SUBSTRING was used to extract UOMs but the concept could be applied to a number of use cases.

Converting (casting) data types in the SQL query

Sometimes data in SQL may not be the right type required to display. One example is if a date or timestamp is stored as a string instead of a date. Numbers can also be stored as strings and need to be converted to display numeric signals. Each of these conversions will be slightly specific to the SQL database, but here are some examples to get started.See the rule "First column must have a name" for more details on the extra SELECT * FROM ().

MS SQL Server

CODE

SELECT * FROM ( SELECT cast(startDate AS datetime2) AS date1, cast(endDate AS datetime2) AS date2, OrderNumber, Operation FROM mydatabase) AS myTable

MySQL

submit your solution

Postgres/Redshift

submit your solution

Oracle

submit your solution

Ambiguous column names when using JOIN or schemas

If the query is a JOIN on two tables and each table has a column of the same name, the column name may be ambiguous.

Original query: SELECT table1.StartTime, table2.EndTime FROM table1 INNER JOIN table2 ON table1.key=table2.key

In this example, both tables have a column named StartTime. Seeq adds a WHERE clause for StartTime but doesn't know which table to reference. This can be solved by adding a SELECT * FROM around the query as discussed in the section "The first column must have a name".

Modified query:SELECT * FROM (SELECT table1.StartTime, table2.EndTime FROM table1 INNER JOIN table2 ON table1.key=table2.key) AS foo

The same situation could be encountered when using schemas.

Original query:SELECT s1t1.StartTime, s2t1.EndTime FROM schema1.table1 s1t1 INNER JOIN schema2.table1 s2t1 on s1t1.key=s2t1.key

Modified query:SELECT * FROM (SELECT s1t1.StartTime, s2t1.EndTime FROM schema1.table1 s1t1 INNER JOIN schema2.table1 s2t1 on s1t1.key=s2t1.key) AS foo

Oracle Tip Regarding Case Sensitivity

Working with Oracle can be a little trickier than the other databases due to Oracle's handling of case. Example 9 had to be massaged for Oracle.

The Oracle LabSignals table was created with the same table name and column names as the above Postgres example. So the table name was mixed case and the column names were lower case.

CODE

CREATE TABLE LabSignals( id nchar(20) not null, workstart date null, workend date null, volume number null, temperature number null, thickness int null, cost number null);

But when asking Oracle for the column names of the table, this didn't work:

SELECT column_name FROM all_tab_columns WHERE table_name='LabSignals' AND column_name!='run_start' AND column_name!='run_end'

Oracle can't find the table or columns unless all caps are used:

SELECT column_name FROM all_tab_columns WHERE table_name='LABSIGNALS' AND column_name!='RUN_START' AND column_name!='RUN_END'

Oracle also returned all the column names in all caps so:

${myMeasurement} = VOLUME, THICKNESS, TEMPERATURE, COST

which meant that this query didn't return any results because the table LabSignalMetadata only has a row with the value "volume" and not "VOLUME".

SELECT units FROM LabSignalMetadata WHERE measurement='${myMeasurement}'

This was fixed by using lower in this query:

SELECT lower(column_name) FROM all_tab_columns WHERE table_name='LABSIGNALS' AND column_name!='RUN_START' AND column_name!='RUN_END'

Columns Named with Spaces, Reserved Words, Numbers, Case Sensitivity

SQL best practices do not recommend creating columns with the following types of names:

  • Names containing spaces

  • Names starting with a digit (0-9) such as 9 or 4foo. In most databases, a query such as SELECT 4foo FROM mytable will return a column named 4foo where every row contains the value 4.

  • Names that are reserved words

    • To see the list of reserved words for a specific database, search the internet. For example, search for "Oracle reserved words".

Depending on the SQL database, columns with case sensitive column names can also be tricky.

If you have column names with these problems, you can work around them using quotes.

Columns Names with Spaces, Reserved Words, and Numbers in Databricks

During table creation, Databricks automatically inserts underscores to correct the column names that have spaces, numbers, or reserved words in them. Because of this, query definitions built using the table badpractice will return column names that are already underscore-corrected by Databricks. This will not match the column names expected by Bad Practice system tests and will cause the tests to fail. A workaround is to read the underscored column names from the database but create the signals using column names with spaces.

An example query definition using this method is shown below:

JSON

{ "Name": "BadPractices", "Type": "SIGNAL", "Sql": "SELECT ${_myKey}, ${_myValue} FROM default.badpractice", "Enabled": true, "TestMode": false, "Variables": [ { "Names": [ "_myKey", "_myValue", "myKey", "myValue" ], "Values": [ [ "_HisTimestamp", "_my_ID", "HisTimestamp", "my ID" ], [ "_HisTimestamp", "ID", "HisTimestamp", "ID" ], ], "Sql": null } ], "Properties": [ { "Name": "Name", "Value": "AT Bad Practice ${myKey} ${myValue}", "Sql": null, "Uom": "string" }, { "Name": "Interpolation Method", "Value": "linear", "Sql": null, "Uom": "string" }, { "Name": "Maximum Interpolation", "Value": "2d", "Sql": null, "Uom": "string" } ], "CapsuleProperties": null}

Columns Names with Spaces, Reserved Words, and Numbers in Denodo

During table creation, Denodo automatically replacement names to correct the column names that have spaces, numbers, or reserved words in them. Because of this, query definitions built using the table badpractice will return column names that may not resemble the original column names. You can change the autogenerated column names to something more familiar by running a command similar to ALTER TABLE badpractice (ALTER COLUMN myid_0 RENAME _my_ID); . This probably will not match the column names expected by Bad Practice system tests and will cause the tests to fail. A workaround is to read the column names from the database but create the signals using column names with spaces similar to the example shown for Databricks.

Quotes

To select one of these types of columns, quotes are needed. Most databases use double quotes (") but MySql uses the backtick single quote (`).

Double quotes need to be escaped (backslashed) when used in a json query definition file since the double quote already has meaning in json.

Original queries

Quoted queries (most databases), escaping required

Quoted queries (MySql)

SELECT mykey, my value FROM mytable

SELECT mykey, \"my value\" FROM mytable

SELECT mykey, `my value`FROM mytable

SELECT mykey, 9 FROM mytable

SELECT mykey, \"9\" FROM mytable

SELECT mykey, `9`FROM mytable

SELECT mykey, select FROM mytable

SELECT mykey, \"select\" FROM mytable

SELECT mykey, `select`FROM mytable


Looking back at Example 8, if you expected the ${myMeasurement} results to be column names with numeric names, the query definition would be modified like so:

CODE

 "QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, \"${myMeasurement}\" FROM LabSignals", ...

Or if MySql, like so:

CODE

 "QueryDefinitions" : [ { "Name" : "MyMeasurementSignals", "Type" : "SIGNAL", "Sql" : "SELECT run_start, `${myMeasurement}` FROM LabSignals", ...

CONDITION Query Definition types with a calculated End Time Column

When the End Time is a calculated time column you must create the query as a sub-query. This allows the connector to place where filters on the SELECT statement to accurately pull the required amount of data. For example, the following query will create a calculated column called end_time that will be 1 minute greater than start time:

CODE

SELECT [start_time], DATEADD(mi, 1, start_time) AS end_time ,[quality] ,[temperature] ,[comment] ,[id] FROM [TestDB].[dbo].[CalcTime]

This will result in an expected result set:

start_time

end_time

quality

temperature

comment

id

2022-07-31 19:59:39.000

2022-07-31 20:00:39.000

39

261

BAD

A1234

2022-07-31 21:45:01.000

2022-07-31 21:46:01.000

90

212

OK

A1234

2022-07-31 22:52:35.000

2022-07-31 22:53:35.000

40

223

BAD

A1234

However, if a filter is placed on end_time the query will fail

CODE

SELECT [start_time], DATEADD(mi, 1, start_time) AS end_time ,[quality] ,[temperature] ,[comment] ,[id] FROM [TestDB].[dbo].[CalcTime] WHERE end_time is NOT NULL

with

CODE

Invalid column name 'end_time'.

This will cause queries for data to fail in Seeq as we need to identify all rows where start and end times are not null to ensure we pull valid capsules.

To overcome this issue, a subquery can be created:

CODE

SELECT [start_time], [end_time], [quality], [temperature], [comment], [id]FROM ( SELECT [start_time], DATEADD(mi, 1, start_time) AS end_time, [quality], [temperature], [comment], [id] FROM [TestDB].[dbo].[CalcTime] ) AS calc_timeWHERE end_time is NOT NULL

The WHERE filter criteria is taken into account for the table scan if filters for ID are present, for example. This allows Seeq to safely apply filtering criteria to the query without breaking it.

Creating a Query Definition for the SQL Connector (2024)
Top Articles
Latest Posts
Article information

Author: Mr. See Jast

Last Updated:

Views: 6253

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Mr. See Jast

Birthday: 1999-07-30

Address: 8409 Megan Mountain, New Mathew, MT 44997-8193

Phone: +5023589614038

Job: Chief Executive

Hobby: Leather crafting, Flag Football, Candle making, Flying, Poi, Gunsmithing, Swimming

Introduction: My name is Mr. See Jast, I am a open, jolly, gorgeous, courageous, inexpensive, friendly, homely person who loves writing and wants to share my knowledge and understanding with you.