SQL Server ML Services - Internal Architecture

This is the second part of series of article on SQL Server ML Services. In first part, we have focused on installation. In this article, we will explore the internal architecture. Let's start.

Extensibility Framework Architecture

The ML services stands tall on the extensibility framework architecture of MS SQL Server which was introduced in 2016 to run external R script. It helps to run the external Python or R script as extension of core database engine.

 Benefits

  • Bridging SQL Server and R/Python by facilitating the execution of external script within SQL Server Engine.
  • Smooth deployment of R/Python models using the Stored Procedure, T-SQL and PREDICT function.
  • Built-in data security as external R/Python scripts run in secured SQL Server framework and no data is transferred out side of the server. If an user only have privilege to access an data by a query then he can also run the external script to get that data. So table, database, and instance level security can be enforced. DBAs can manage user access, resources used by external scripts, and external code libraries added to the server
  • Popular SSMS or Azure Data Studio can be used for writing the code

Architecture

The extensibility framework architecture is devised in a way that R/Python scripts run as
separate process without affecting core database engine. The processes exchange data and work coherently to produce the final output.

Diagram: Extensibility framework component architecture for Windows

 Major components are:

  1. SQL Server Process - It executes the core database functionality
  2. Launchpad Process - It manages and executes external scripts e.g. RLauncher.dll for the R and Pythonlauncher.dll for Python in Windows. Each database engine has its own launchpad service. It takes a secured worker from its pool and launch a satellite process to manage the external scripts.
  3. Bxlserver Process - Binary Exchange language  or Bxlserver manages the communication between SQL Server and external processes using SQL Satellite. It also ensures secure working folders for each external script job.
  4. Finally SQL Satellite - It is an extensibility API of database engine, which is implemented by R/Python language runtimes for input/output data read-write, arguments read-write and error handling.

 

 Communication Flow

The extensibility framework components communicate with each other for delivering the final output.

Diagram: Extensibility framework components communication flow for Windows

  1. A request for the Python runtime is indicated by the parameter @language='Python/R' passed to the stored procedure. SQL Server sends this request to the Launchpad service.
  2. The Launchpad service starts the appropriate launcher; in this case, Python Launcher or R Launcher.
  3. PythonLauncher starts the external Python35 process.
  4. BxlServer coordinates with the Python runtime to manage exchange of data, and storage of working results.
  5. SQL Satellite manages communications about related tasks and processes with SQL Server.
  6. BxlServer uses SQL Satellite to communicate status and results to SQL Server.
  7. SQL Server gets results and closes related tasks and processes

 

Communication Protocol

Components communicate among each other with various protocols. Like:

  1. TCP/IP - default communication protocol between SQL Server and the SQL Satellite
  2. Named Pipes - the BxlServer and SQL Server uses a proprietary protocol Named Pipes for exchanging data
  3. ODBC - User placing a external script request to a remote SQL Server uses the ODBC protocol
  4. RODBC - R language uses RODBC in code for retrieving data from SQL Server
  5. XDF file format - It is used to transfer blocks of data between SQL Server and external script

 

That's all about internal architecture.

Add comment