Sensitive Data Discovery Pre-Configuration Details

Only application admins can enable sensitive data discovery (SDD) on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.

Configurable global settings

Global template

When SDD is triggered on a data source, the job is run for the identifiers within the set template. If a template is not set, the identifier and template within the SDD job are defined by the global setting. By default, the global setting will run for all identifiers in the system. However, a system administrator can configure Immuta to use a custom global template instead.

An active global template cannot be deleted.

Sample size

SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured identifiers.

The default for SDD is to sample 1000 records (the sample size) during this process. However, administrators can configure the sample size taken by SDD on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements.

Tag mutability

When SDD is triggered by a data owner, all column tags that were previously applied by SDD are removed and the tags prescribed by the latest run are applied. However, if SDD is triggered because a new column is detected by schema monitoring, tags will only be applied to the new column, and no tags will be modified on existing columns.

Dry run

Users can also configure SDD to do a dryRun, which allows them to see what tags would be applied to a data source without actually applying them. See the Run sensitive data discovery on data sources page for details.

SDD workflow

Two common workflows for using SDD are outlined below. The first illustrates how to apply a single global template to all data sources, while the second outlines how users can create and apply templates to data sources they own.

Workflow 1: Apply a global template to all data sources

Data governor creates a template using one or more built-in or custom identifiers.
System administrator adds this template to the global settings so that it applies to all data sources.
Users trigger SDD on data sources.

Workflow 2: Apply a template to a specific data source

Data governor creates one or more custom identifiers:
Data owner creates a template containing one or more identifiers.
Data owner applies their template to one or more data sources.
Data owner triggers SDD on one or more data sources, and tags are applied to columns where identifiers were recognized.