Added new section about data classification retrieval

Jenny Tam 2019-09-04 13:31:01 -07:00
Родитель 7f6fb49c81
Коммит d79c8ad46f
1 изменённых файлов: 186 добавлений и 0 удалений

@ -13,6 +13,7 @@
* <a href="#datetime">Retrieving DateTime values as strings or PHP DateTime objects</a>
* <a href="#formatDecimals">Configurable options to format fetched decimal or numeric values</a>
* <a href="#language">Language Selection</a>
* <a href="#dataClass">Data Classification Sensitivity Metadata</a>
##
@ -218,3 +219,188 @@ The equivalent connection and statement attributes in pdo_sqlsrv are `PDO::SQLSR
The SQLSRV and PDO_SQLSRV drivers allow the user to specify a preferred language using the `Language` connection option. The effect of this option is similar to executing `SET LANGUAGE`. The available languages come from the server's syslanguages table. To see the full list of available languages, execute the following query: `SELECT name,alias FROM sys.syslanguages`.
Note that this option only affects the language of messages returned from the server. It does not affect the language used by the drivers themselves, as they are currently available only in English, and it does not affect the language of the underlying ODBC driver, whose language is determined by the localised version installed on the client system. Therefore, it is possible that changing the `Language` setting will result in messages being returned in different languages, depending on whether they come from the PHP driver, the ODBC driver, or SQL Server.
<a name="dataClass" />
## Data Classification Sensitivity Metadata
[Data classification](https://docs.microsoft.com/en-ca/azure/sql-database/sql-database-data-discovery-and-classification) includes two metadata attributes:
- Sensitivity Labels - define the sensitivity level of the data stored in the column
- Information Types - provide additional granularity into the type of data stored in the column
We can use the classification engine of SQL Server Management Studio (SSMS) to scan the database for columns containing potentially sensitive data, which then provides a list of recommended column classifications. We can also manually classify columns as an alternative, or in addition to the recommendation-based classification.
Another option is to use Transact-SQL to classify sensitivity data. Theoretically, each column may have more than one tuple of (label, information type), but for simplicity the following examples use only one tuple per column.
Take a `Patients` table for example:
```
CREATE TABLE Patients
[PatientId] int identity,
[SSN] char(11),
[FirstName] nvarchar(50),
[LastName] nvarchar(50),
[BirthDate] date)
```
We can classify the `SSN` and `BirthDate` columns as shown below:
```
ADD SENSITIVITY CLASSIFICATION TO [Patients].SSN WITH (LABEL = 'Highly Confidential - GDPR', INFORMATION_TYPE = 'Credentials')
ADD SENSITIVITY CLASSIFICATION TO [Patients].BirthDate WITH (LABEL = 'Confidential Personal Data', INFORMATION_TYPE = 'Birthdays')
```
To access the classification metadata defined in SQL Server 2019, run the following query:
```
SELECT
schema_name(O.schema_id) AS schema_name,
O.NAME AS table_name,
C.NAME AS column_name,
information_type,
label
FROM sys.sensitivity_classifications sc
JOIN sys.objects O
ON sc.major_id = O.object_id
JOIN sys.columns C
ON sc.major_id = C.object_id AND sc.minor_id = C.column_id
```
The results are
|schema_name | table_name | column_name | information_type | label |
|---------------|---------------|---------------|-----------------------|-------|
|dbo | Patients | BirthDate | Birthdays | Confidential Personal Data |
|dbo | Patients | SSN | Credentials | Highly Confidential - GDPR |
Starting in 5.7.0 preview of PHP drivers, new statement attribute (option) have been added to the query / prepare methods to specifically request sensitivity classification metadata using the existing metadata functions. The query is a select statement that includes the table columns of interest, which may or may not have sensitivity classification metadata defined.
### SQLSRV driver
The [sqlsrv_field_metadata()](https://docs.microsoft.com/sql/connect/php/sqlsrv-field-metadata?view=sql-server-2017) will return data classification sensitivity metadata if the new `DataClassification` option is `true` (`false` by default). Only `SSN` and `Birthdate` columns contain sensitivity metadata.
```
$tableName = 'Patients';
$tsql = "SELECT * FROM $tableName";
$stmt = sqlsrv_prepare($conn, $tsql, array(), array('DataClassification' => true));
if (sqlsrv_execute($stmt)) {
$fieldmeta = sqlsrv_field_metadata($stmt);
foreach ($fieldmeta as $f) {
if (count($f['Data Classification']) > 0) {
echo $f['Name'] . ": \n";
print_r($f['Data Classification']);
}
}
}
```
The output will be like this:
```
SSN:
Array
(
[0] => Array
(
[Label] => Array
(
[name] => Highly Confidential - GDPR
[id] =>
)
[Information Type] => Array
(
[name] => Credentials
[id] =>
)
)
)
BirthDate:
Array
(
[0] => Array
(
[Label] => Array
(
[name] => Confidential Personal Data
[id] =>
)
[Information Type] => Array
(
[name] => Birthdays
[id] =>
)
)
)
```
If the user likes to see the metadata of each column using `sqlsrv_query`, the user can modify the above script as follows, using `json_encode`:
```
$tableName = 'Patients';
$tsql = "SELECT * FROM $tableName";
$stmt = sqlsrv_query($conn, $tsql, array(), array('DataClassification' => true));
$fieldmeta = sqlsrv_field_metadata($stmt);
foreach ($fieldmeta as $f) {
$jstr = json_encode($f);
echo $jstr . PHP_EOL;
}
```
Please note that by default, Data Classification is NOT included, unless the user sets the option `DataClassification` to `true`.
```
{"Name":"PatientId","Type":4,"Size":null,"Precision":10,"Scale":null,"Nullable":0,"Data Classification":[]}
{"Name":"SSN","Type":1,"Size":11,"Precision":null,"Scale":null,"Nullable":1,"Data Classification":[{"Label":{"name":"Highly Confidential - GDPR","id":""},"Information Type":{"name":"Credentials","id":""}}]}
{"Name":"FirstName","Type":-9,"Size":50,"Precision":null,"Scale":null,"Nullable":1,"Data Classification":[]}
{"Name":"LastName","Type":-9,"Size":50,"Precision":null,"Scale":null,"Nullable":1,"Data Classification":[]}
{"Name":"BirthDate","Type":91,"Size":null,"Precision":10,"Scale":0,"Nullable":1,"Data Classification":[{"Label":{"name":"Confidential Personal Data","id":""},"Information Type":{"name":"Birthdays","id":""}}]}
```
### PDO_SQLSRV driver
Similarly, one of the fields returned by [PDOStatement::getColumnMeta()](https://docs.microsoft.com/sql/connect/php/pdostatement-getcolumnmeta?view=sql-server-2017) is "flags", which specifies the flags set for the column (always `0`). Beginning with 5.7.0 preview, the user can set the new statement attribute `PDO::SQLSRV_ATTR_DATA_CLASSIFICATION` to `true`, like this:
```
$options = array(PDO::SQLSRV_ATTR_DATA_CLASSIFICATION => true);
$tableName = 'Patients';
$tsql = "SELECT * FROM $tableName";
$stmt = $conn->prepare($tsql, $options);
$stmt->execute();
$numCol = $stmt->columnCount();
for ($i = 0; $i < $numCol; $i++) {
$metadata = $stmt->getColumnMeta($i);
$jstr = json_encode($metadata);
echo $jstr . PHP_EOL;
}
```
The output of metadata for all columns is shown below:
```
{"flags":{"Data Classification":[]},"sqlsrv:decl_type":"int identity","native_type":"string","table":"","pdo_type":2,"name":"PatientId","len":10,"precision":0}
{"flags":{"Data Classification":[{"Label":{"name":"Highly Confidential - GDPR","id":""},"Information Type":{"name":"Credentials","id":""}}]},"sqlsrv:decl_type":"char","native_type":"string","table":"","pdo_type":2,"name":"SSN","len":11,"precision":0}
{"flags":{"Data Classification":[]},"sqlsrv:decl_type":"nvarchar","native_type":"string","table":"","pdo_type":2,"name":"FirstName","len":50,"precision":0}
{"flags":{"Data Classification":[]},"sqlsrv:decl_type":"nvarchar","native_type":"string","table":"","pdo_type":2,"name":"LastName","len":50,"precision":0}
{"flags":{"Data Classification":[{"Label":{"name":"Confidential Personal Data","id":""},"Information Type":{"name":"Birthdays","id":""}}]},"sqlsrv:decl_type":"date","native_type":"string","table":"","pdo_type":2,"name":"BirthDate","len":10,"precision":0}
```
If `PDO::SQLSRV_ATTR_DATA_CLASSIFICATION` is false (the default case), the output of all metadata will be like this:
```
{"flags":0,"sqlsrv:decl_type":"int identity","native_type":"string","table":"","pdo_type":2,"name":"PatientId","len":10,"precision":0}
{"flags":0,"sqlsrv:decl_type":"char","native_type":"string","table":"","pdo_type":2,"name":"SSN","len":11,"precision":0}
{"flags":0,"sqlsrv:decl_type":"nvarchar","native_type":"string","table":"","pdo_type":2,"name":"FirstName","len":50,"precision":0}
{"flags":0,"sqlsrv:decl_type":"nvarchar","native_type":"string","table":"","pdo_type":2,"name":"LastName","len":50,"precision":0}
{"flags":0,"sqlsrv:decl_type":"date","native_type":"string","table":"","pdo_type":2,"name":"BirthDate","len":10,"precision":0}
```