Alpha to Omega: From Creating a traditional SQL DB to modern Stream Processing with Kafka - ByteScout
  • Home
  • /
  • Blog
  • /
  • Alpha to Omega: From Creating a traditional SQL DB to modern Stream Processing with Kafka

Alpha to Omega: From Creating a traditional SQL DB to modern Stream Processing with Kafka

SQL DB Introduction

Imagine a real-time transportation booking network of drivers and passengers operating in a frenetic, densely populated Asian city like Phnom Penh. Thousands of passenger apps are simultaneously uploading and downloading GPS location data, their maps are constantly retrieving driver locations, updating UI displays, and confirming bookings.

Hundreds of driver apps are likewise constantly feeding back their location and confirming their availability status. New membership accounts, payments, and online tech support are all constantly in play! If you were tasked with building an app to manage this network, what kind of database structure would you use?

Think of the most efficient structure and cut it in half five times! This app must be live and updated every second on every device, or customers will simply vanish as they uninstall and move to a competitor’s app!

SQL can handle the data store, but for this bleeding-edge live, real-time app you will need a highly scalable platform which supports the parallelism required to accommodate thousands of simultaneous users. To achieve this feat you will need a SQL platform which supports live data feeds and stream processing.

A great candidate platform today is an Apache Kafka API Stream processing to continually feed your app’s SQL database. Although we might normally code a Kafka app with Java, there is now a fully SQL compatible platform called KSQL which will look familiar to SQL developers and eliminate the intensive Java coding previously required.

Such new technologies evolve rapidly to fill clear and definite needs, and to stay relevant we must track their evolution! To survey this technology with the example transportation booking app in mind, we will look at several germane components.

How to Build a Basic SQL Database

We will sweep across the entire domain, beginning with creating a basic SQL database, imagining the functionality of data involved, and following all the way through to the core definition of handling a live data feed with Apache Kafka open source stream processing for a live and real-time app with the ultimate system performance demands.

The first step is to actually create a SQL database, and understand the concepts involved. After identifying the appropriate platform for your application (the top three most widely used versions of SQL are still MySQL, Oracle, and Microsoft SQL or MS SQL Server) you will need to create an instance of your intended database on a web server. And then build in a structure of tables, rows, and relational keys to support logical querying for your app, and to encompass the full definition of your application’s functional scope.

Because MySQL is the most popularly known RDBMS among web developers today, and because it is available as a standard platform on practically all commercial web servers, our comparative SQL database example will focus on a particular implementation of creating a MySQL database.

MySQL is usually hosted on a web server, but can also be implemented as a local database. In this SQL database tutorial, let’s start out with the simple act of creating the database. We will create our SQL database online via the control panel to upload our script file. By the way, hosting cPanels nearly always come equipped with a SQL database editor, and is called MyPHP Admin.

Nowadays, all web servers come standard with a MySQL Wizard tool as part of the generic control panel with which entry-level users can build a SQL database.

This set of tools usually includes My PHP Admin. It provides a user interface to graphically create and manipulate the database, SQL tables, data column labels, primary keys, relational joins, and all the standard fare of database management. While that is a nice tool for beginners, we will need to take full programmatic control of the database, and so we will start with a simple SQL script to create a SQL database.

Fast Way to Create a SQL Database

The fastest way to build SQL databases, especially if you’re a LAMP stack developer is to send the SQL statements through PHP. In the following code sample we’re going to accomplish several tasks in a single file:


<?php

// define server credentials as variables

$myservername = "localhost";

$myusername = "username";

$mypassword = "password";

 

// create a connection string to pass

$conn = new mysqli($myservername, $myusername, $mypassword);

 


// catch connection errors:

if ($conn->connect_error) {

die("Connection unsuccessful: " . $conn->connect_error);

}

 


// create a new database

$sql = "CREATE DATABASE PassApp";

if ($conn->query($sql) === TRUE) {

echo "New database created successfully";

} else {

echo "Creating database unsuccessful: " . $conn->error;

}


$conn->close();

?>

This simple PHP script accomplishes quite a lot! Here, we’ve built our server authentication into the connection string $conn in the first few lines. This is a very common and standard way to authenticate a PHP file to run on your server. You can upload this file through your favorite FTP client, or you can also use the control panel UI provided by your web host to upload this file to your site.

We also demonstrated a very handy method of passing a SQL language statement to the server to actually create the database with the name “PassApp.” You can run this code on your server by entering the URL of your site plus the name of the file as you saved it, for example, www.mysite.com/createdb.php

As for the SQL database design, now we are ready to define the structure of our new db with SQL statements to add tables, column labels for our data, primary keys to validate the uniqueness of ID fields and even stored procedures to execute frequently used query and sort routines.

We need to learn how to query an SQL database. Let’s start by adding a table to our transportation booking app which will track driver location and availability details. We can reuse most of the PHP code in the previous example, just changing the $sql variable to enter a new SQL statement to create a driver availability table in our new db.


CREATE TABLE DriverAvailability (

DriverID varchar(10),

DriverLocation varchar(16),

DriverDestination varchar(20),

TimeAvailable varchar(255),

);

To visualize such a SQL table structure, just imagine a spreadsheet with columns and rows; the data can likewise be accessed as an indexed array, and Python users will like this to Pandas Dataframe objects.

If you used your host control panel to create the database, you can add the password as an extra parameter in the $conn string in the previous code snippet. Otherwise just paste the CREATE TABLE line into the $sql variable and change the error catching statement accordingly.

In order to demonstrate the crucial importance of live data feeds with the Kafka API data stream later in this tutorial, we have defined our new table with data columns which illustrate the urgent need for the fastest possible data feed and update to our booking app db.

As you may imagine, machine learning apps are now used to predict the future available time of a driver currently en route to a destination in order to anticipate matches with queued passenger requests. This is such a common practice today that the UCI database is packed with NY Taxi data for both academic and commercial research. And there are now many apps similar to Uber operating in cities globally.

Our new TimeAvailable field will exemplify a datum the update speed of which is absolutely pivotal and upon which the success of the entire system depends for competitivity. When a prediction algorithm calculates a future available time for a given driver, this field must be updated instantaneously, and the latency of the availability is critical to the success of the entire platform.

Performance and load testing of our app may very well indicate that pthreads will not be sufficiently fast to update location and availability in a standard MySQL database platform! In a such a case, we may actually be required to retrofit our entire system with a faster platform. This is where Apache Kafka streams work like a charm.

But Why use Kafka data streams instead of parallelism in PHP?

It is reasonable to wonder when we are already using PHP and SQL, why change platforms midstream? Abundant misconceptions plague the world of developers. Among them is the old idea that PHP app processes cannot be threaded. And the advocates of Node.js cling to this assertion, although it is untrue.

PHP supports multi-threaded apps and parallelism via pthreads, which is well documented. Pthreads may or may not be adequate, and performance testing alone will answer this question. However, old relics of habitual thinking die hard, especially among semi-technical project leaders, and PHP is not ordinarily chosen for such tasks as our breakneck booking app.

Architects of Apache Kafka know this well, and furthermore, intend to extend techniques of parallelism in data feed connectivity for live apps including the IoT class of real-time components in smart homes and smart cars. Although much of this can be achieved with pthreads, our aim here is to introduce the new technology of Kafka API data streams and Kafka data feeds. There is evidence that the Kafka technology provides dedicated parallelism capability which will be invaluable to our booking app. So, let’s roll along into the new era.

Real-Time Data Streaming SQL

The elegance of KSQL is that we can operate Kafka stream processing via our very comfortable and familiar SQL-like language. This strategy avoids the overhead of developing a Java app perform Kafka stream processing and the corequisite of compiling and deploying such an app. Let’s start with a simple example to implement stream processing. Kafka supports the CREATE STREAM statement which should look very familiar to SQL developers:


CREATE STREAM NearbyDrivers AS

SELECT * FROM DriverAvailability

WHERE TimeAvailable <= 2;

In this KSQL statement, we are creating a stream processing table from our driver availability table where we get a constantly updated view of all drivers with predicted availability within the next two minutes! We will need to refine this by adding filters to include only drivers with current locations within not only a reasonable distance from the passenger.

But a location which is predictably calculated from previous tables of traffic, construction detours, and accident delays to be a viable driver for a confirmed passenger. Those tables receiving live and real-time traffic, weather, construction, and police reports to impact on driver availability must also be streamed through the Kafka API connection in order to be as accurate and up-to-date as physically possible. Here is a hypothetical scenario:


CREATE STREAM AccidentReportGPS AS

SELECT DriverID, DriverLocation

FROM website_reports PoliceTraffic

LEFT JOIN AccidentLocation ON DriverLocation

WHERE ActiveStatus=’current’;

The objective again simplified for our purpose, is to demonstrate how to build up-to-date relevance into a data feed from sources including web feed and its impact on our driver network. The hypothetical example must be refined to a format which makes GPS locations of drivers and accidents compatible. Although such a task is beyond our scope, the crucial importance of network speed is irrefutable.

The Core of Kafka: Transaction Logs

Kafka’s speed is based on the critical processing of database transaction logs. This is a rich history of every SQL procedure executed on the database. KSQL queries aggregate their own input data. A KSQL aggregation is stateful; This means that Kafka must statistically account for errors, machine failures, and use these representations of state to achieve the best possible efficiency and accuracy in operation.

Data Streams as SQL Tables

Sometimes described as the Stream-Table Duality, Kafka stream processing represents data feeds as tables which are accessible to SQL statements in exactly the same way as ordinary MySQL tables! In other words, Kafka API creates a sort of alias for the data stream which can be used by a SQL developer as if it were a table sitting in the database. The difference is that it is alive and real-time data feed. This is what immediately sets Kafka apart as a candidate for achieving the necessary speed for our competitive booking app. In fact, if you didn’t know in advance, KSQL stream would look just like SQL tables!

The Kafka KSQL EDGE

In this cutting-edge ByteScout SQL tutorial, we’ve explored how to create a traditional SQL database, and brought technology up to the minute with streaming SQL via Apache Kafka to represent live data feeds as SQL tables! The quintessential substance here is more in the purpose served than the software application. In other words, our booking app’s extreme craving for speed is only grasped by knowing the current context of user expectation and the existing competition online.

Afterall, 50 years ago, people might have been satisfied to wait 20 minutes for a taxi to arrive. Now a passenger is likely to consider a taxi app impractical if a driver does not arrive in 30 seconds. This kind of pressure spawns the evolution of platforms like Kafka and KSQL. And we will have to reckon with these new technologies to keep our CVs on the front lines! As this new streaming SQL technology is adopted by more and more developers, stay tuned to ByteScout!

P.S. Check our professional tutorial about SQL and Machine Learning technologies.

 

About the Author

Author Mark

Mark Ronald Moore

Mark is a freelance consultant and coder in the areas of machine learning, automation testing, and web app development. He currently writes coding tutorials and tech articles regularly for ByteScout. Mark is a resident of Humboldt, California, and enjoys hiking in the redwoods.

 

 

prev
next