HOW TO DO GRAPH ANALYSIS ON POSTGRESQL WITH ARCADE

Filter Traverse

Written by: Gabriele Ponzi

 

Nowadays graph visualization and analysis is a critical tool to have in your toolkit. Developers, analysts, business executives, and really anyone that uses data, can use graph visualization tools to extract information from data and see how the data interacts. It is one thing to know that something exists, it is another to see how it affects and is affected by those things around it.

In today’s market there are several graph visualization tools that are able to connect to various graph databases. Most people find that issues arise when they do not have access to a graph database but want to use a graph visualization tool. Most graph visualization tools do not have the ability to integrate with a relational database, the most commonly used database. One solution is migration. Companies can spend massive amounts of money to move all of their data from a relational database to a graph database. However, for many people this is not a solution. Migrations are expensive and time consuming and many people do not want to deal with the headache that it can bring.

If you are in this situation an attractive alternative solution is Arcade Analytics. What is Arcade?

 

“Arcade Analytics is a graph visualization tool that enables users to have more control over their data. Arcade sits on top of the user’s database and allows the users to query data and show it in a graph.”

 

One of the most attractive features of Arcade Analytics is that it allows users to query data from a relational database and visualize the relational database as a graph. Arcade’s RDBMS connector allows users to perform a graph analysis over your RDBMS without any migration and with few simple steps.

To understand how this is possible, let’s explore the RDBMS connector.

 

How does it work

Thanks to Arcade you can visually inspect relationships and connections within your RDBMS and treat your data as a graph.

The key to achieve that is the model mapping between the source data model, the Entity-Relationship Model (ER), and the target data model, the Graph Model.

Once a coherent and effective model mapping is performed (don’t worry, that’s a completely automated process), you can query your source dataset and play with it as if it was a graph:

  • Each record is transformed in a correspondent
  • Each connection between two records, inferred through a relationship between two tables and computed through a join operation, generates an edge.

The ER model is built starting from the source DB schema: each table, known also as Entity, and each Relationship in the DB is inferred from the metadata.

This automated mapping strategy adopts a basic approach: the source DB schema is directly translated as follows:

  1. Each Entity in the source DB is converted into a Vertex Type.
  2. Each Relationship between two Entities in the source DB is converted into an Edge Type.

All the records of each table are handled according to this schemas-mapping: each pair of records on which it’s possible to perform a join operation, will correspond to a pair of vertices connected by an edge of a specific Edge Type.

Arcade allows connections to Oracle, SQLServer, MySQL, PostgreSQL and HyperSQL.
In order to show you how the tool works, I chose a sample relational database: we will connect to a PostgreSQL server and we will perform a graph analysis on the DVD Rental database. To learn more about this specific dataset click here.

Here is the source database schema (a.k.a. the ER Model).

According to the mapping rules stated above, the following correspondent Graph Model will be automatically built from the tool.

Really simple, isn’t it?! Still some doubts? Okay, let’s see a couple of mapping examples.

 

1-N Relationship between Film and Language

The Film and Language source tables are translated in two correspondent Vertex Types. The properties contained in the Film and Language Vertex Types directly come from the columns belonging to the two source tables.

The logical relationship between the two tables generates the HasLanguage Edge Type.

 

N-N Relationship between actor

In RDBMS’s N-N Relationships are expressed through join tables. In the sample schema below you can see how the N-N relationship between actors and movies is modelled through the join table Film_Actor.

As you can see, also the central join table is translated in a specific Vertex Type allowing you to traverse the N-N relationship between actors and movies.

For this reason, traversing N-N relationships is equivalent to traverse two 1-N relationships, like in the relational world:

  • 1-N relationship between Film and Film_Actor
  • 1-N relationship between Film_Actor and Film

Now let’s start to play with data!

 

Retrieving data and performing our first analysis

Once connected to the source relational database we will get a new empty widget like the following.

First, we will retrieve some data. We have three main ways to do that:

We will start with a specific actor, retrieving his information through a full text search on the name, then we will expand all his connections present in the source database. For the purposes of this example I will search for Christian Gable.

Let’s load the vertex by clicking the designated button. Below you can see the new vertex: to inspect its content we just have to open the Graph Element menu.

Now we can start our analysis: let’s suppose we want to find all the customers who rented a movie where Christian Gable performed as actor. We can simply do that by navigating all the relationships of our graph model by using the Traverse menu.

First we have to retrieve all the vertices of the film_actor join table. Doing this will also allow us to fetch the film vertices.

 

Now starting from the movies we can expand all the 270 ingoing relationships.

We have now retrieved all the vertices connected to each specific movie: in this case the has_film edge type is mapped with 3 different relationships:

  • inventory → film
  • film_category (join table) → film
  • film_actor (join table) → film

In order to find all the customers who rented all these movies, we should focus just on the inventory → film relationship, so we can get rid of useless vertices belonging to film_category class and the new film_actor nodes by deleting them (click the class button on the right hand sand to select the desired nodes and use the Delete button in the sidebar).

Now we can navigate the ingoing has_inventory edges, connecting inventory vertices with those belonging to the rental class.

We arrive at the last connection: from each rental vertex we can reach a specific customer by expanding the outgoing has_customer relationship.

Here we are, we have found all the customers who rented a movie where they could see a wonderful performance by the great Christian Gable (who does not know him)!

Now, what if we want to narrow our analysis to a subset of customers according to a specific movie of Christian Gable?

Very simple: we can select a specific film and play with the selection of the ingoing and outgoing elements till a certain depth (Selection menu), as shown in the following screenshots.

At this point we can delete all the rest of the graph by inverting the current selection through the Invert operation.

Then we can delete them all with a click and… here is the result.

This RDBMS Connector is just a beta version, there are some limitations, such as if you didn’t define constraints, such as foreign keys between the tables on which you usually perform join operations, you will lose this kind of information during the querying process.

Because of this, if foreign keys are missing, you will not have any edges in your final Graph Model, and you will not able to traverse any relationship.

To overcome these limitations, the Arcade team is working on a visual tool mapper, allowing users to edit the basic mapping. This way you will be able to add connections in your dataset by defining new edge types between vertices.

In this post we had just a taste of the analyses we can perform over a relational database thanks to Arcade, new features will be shown in next posts.

I hope this post was helpful and interesting.

 

Stay tuned!

 

Gabriele Ponzi
Senior Software Engineer

Gabriele Ponzi is a Senior Software Engineer specializing in database integration at Arcade Analytics. Gabriele received both a Bachelor and Masters Degree in Computer Engineering from Roma Tre University. He has extensive experience in relational and graph databases and is an industry expert in database migration.
Gabriele Ponzi is a Senior Software Engineer specializing in database integration at Arcade Analytics. Gabriele received both a Bachelor and Masters Degree in Computer Engineering from Roma Tre University. He has extensive experience in relational and graph databases and is an industry expert in database migration.