This introductory course on neo4j is targeted to anyone interested in discovering the wonders of graph databases with the leading tool.
The course has three modules.
The first one covers the foundations of the Graph Theory, the comparison between a graph database to a relational database and the origins of neo4j. The lectures allow to understand the concepts of a graph database and how to work with a graph database on neo4j although no Graph Theory is strictly required for so.
The second one, installs and reviews the tool covering the in and outs of the neo4j Browser along with the Cypher syntax basics.
The last module elaborates on the techniques to fetch and modify graph entities. The relevant clauses and functions are covered gradually along with the theory and examples plus the challenges proposed.
Along the course, the students build a strong understanding of neo4j while gaining skills on Cypher query language beyond its fundamentals so to unleash the powerful querying capabilities of graph databases.
What am I going to get from this course?
- Understand graph databases fundamentals with the market’s leader: neo4j
- Master graph database beyond the fundamentals with Cypher query language from data creation to complex data retrieval and update with clear lectures, examples and challenges
- Get the proper skills for data modeling and exploiting the powerful querying capabilities of graph databases
Prerequisites and Target Audience
What will students need to know or do before starting this course?
This course has not many prerequisites rather than some generic understanding of databases and data models and entities. Some SQL and or programming skills are helpful though not mandatory.
With regards software requirements, Java Virtual Machine has to be installed with neo4j though Windows and Mac installable packages bring it already. The Sandbox can be a solution for following the course without the need of installing anything locally.
Who should take this course? Who should not?
Anyone interested in discovering the wonders of graph databases with the leading tool so to support deeply content related apps or even to perform in-depth data analysis in fields such as fraud, social networks, real-time recommendations, infraestructure, etc.
Although graph databases such as neo4j are meant to exploit graphs themselves, no graph theory is required in order to master such databases. On the contrary, just few concepts and some clear diagrams are needed instead.
This lecture summarises the origins of the graph theory with Leonhard Euler's 'Seven bridges of Königsberg' (1736) and the foundations of the graph theory: objects and relationships.
Graph vs Relational Databases
On relational databases, the data is stored on a set of rows and columns on tables. Relationships between tables are made explicit on the SQL query as they are not stored on the database. That is why foreign keys are required in order to relate entities. On graph databases such as Neo4j, the data is stored in form of nodes and relationships crossreferenced via graphs, that is why they excel on agility, performance and flexibility. Some of the most relevant use cases are ring fraud detection analysis on insurance, money-laundering analysis on banking, social network analysis, recommendation engines on online stores, logistics optimisation amongst many others.
This lectures covers the main differences on modelling and managing between graph and relational databases comparing the data modelling and updating content for the same dataset example. It also covers the basic entities on a graph database: node, relationship, label and properties.
Brief History Of neo4j
Neo4j begun as a property graph model project back in 2000. neo4j’s popularity has been growing exponentially in the last couple of years thanks to the release of the Cypher language in 2015, which is their query language. Currently is the leading application for graph databases being used by top ranked corporations.
This lecture covers the history of the project, the latests developments made available and the several available applications (Community and Enterprise Edition, Desktop and Sandbox environment). It focuses on the business cases where neo4j excels such as financial/insurance fraud detection, capacity and outage analysis on infraestructure, metadata repository management, social network management and analysis, real-time recommendation engines amongst others.
Module 2: First steps on neo4j
The neo4j Community Edition application is available on Linux, Mac and Windows platforms. The engine requires a Java Virtual Machine (included on the Mac and Windows packages).
This lectures shows how to get the latest version for the Mac operating system and its installation. Before launching the database engine, the settings for database location, JavaVM Tuning and plugins and extensions are covered. Starting the engine, the neo4j Browser webpage is accessible.
Besides the standard procedures that come with neo4j, it is highly recommended to grab the collection of APOC procedures. APOC (Awesome Procedures on Cypher) is a collection of procedures that allow complex implementations that can't be expressed directly on Cypher.
The lecture covers the installation of APOC and the settings adjustments on the configuration files.
neo4j allows a fully functional sandbox environment for those users that want to have hands on without the need of messing up with installations.
The Sandbox has several dataset examples to play on an Enterprise Edition of neo4j for a limited time. Usually it begins with a 3 day period extendable up to 10 days. It includes already the APOC.
This can be an alternative to install locally the neo4j database.
This lectures covers how to access to the Sandbox environment and instantiate sandboxes from the available ones.
The Browser is where everything happens on neo4j. Fully rewritten in 2017, the neo4j Browser provides an extensive help while querying and managing the entities. The web based environment has three main areas: the query console, the containers and the application menu. The query console is where the cypher queries are written and executed. The containers do either present the results from an executed query or provide tips or help about the tool.
In this lecture the current modules, menus and settings are explained as well as navigation tips and keystrokes. Some predefined example queries and standard procedures are performed to illustrate the behaviour of the tool. Custom queries execution introduces the Favourite queries section. The Hello World! example is executed so to create the first nodes and relationship.
Cypher is the native query language of neo4j.
This lecture a recap on node, relationship, label and property definition as a prior step to illustrate how these concepts are translated into Cypher language. The basic query structure the queries have is explained from the 'Hello World!' example query in order to identify the graph elements (node, relationship, label and property) including variable definition. The MATCH | CREATE, WHERE and RETURN | DELETE clauses are outlined.
Module 3: Hands on with Cypher
The Movie Dataset Example
neo4j brings some sample datasets so to be able to play with content without the need of creating it. The Movie Dataset contains some movies and its directors, actors, etc. This dataset is to be used on the examples throughout this course.
The lecture covers how to easily load (CREATE) the Movie Dataset and the overview of the database schema (CALL db.schema function).
Searching for nodes (MATCH and WHERE)
Some content has already been created on previous lectures with the Hello World! and the Movie Dataset. Fetching data on a graph database is slitghtly different from a relational one as it’s based on what entities are of interest to retrieve rather than how to gather them.
This lecture covers mainly the simple query structure for fetching nodes applying label filtering and also using string filters on node properties either on the MATCH or WHERE clause.
MATCH, WHERE, RETURN, LIMIT clauses are covered.
Further searching on nodes (strings and regex)
String and string pattern filtering are quite useful when querying data.
This lecture aims to provide some internal string filtering tools along with the usage of regular expressions for complex pattern searches such as case insensitive matching ones.
STARTS WITH, ENDS WITH, CONTAINS clauses and the regular expressions are covered.
Searching for relationships
Nodes are related to other nodes via relationships creating paths. Relationships based searched boost the power to fetch data.
In this lecture, the syntax for related nodes querying is covered including the label filtering and its directionality. The Cmd+Up Arrow keystroke for previous query reuse is explained. Encapsulated nodes and relationships on the RETURN clause were encapsulated in parentheses and square brackets for educational purposes as Cypher identifies the type of variable is dealing. As the user is already familiar with Cypher, from this lecture, no longer the contextual help on the variable usage is required. Finally, colouring of nodes and relationships of the Movie Dataset is performed.
Listing, sorting and aggregating + filtering (WITH and WHERE NOT)
In some cases, the results are to be retrieved in form of lists.
In this lecture, the data retrieval focuses on getting data retrieved in form of lists taking advantage of aggregation and extraction functions. The output aliases for variables are introduced as well as the sorting on variables. Simple aggregation functions or array collections are applied on entities. The scope definition clause, WITH, is used to either specify the scope on defined variables or to incorporate new ones such as aggregates. Finally, the exclusion patterns are interesting to filter out some pattern conditions.
The clauses AS, ORDER BY, WITH, COLLECT, WHERE NOT are covered.
The functions type, labels, properties are performed to retrieve the attributes of nodes and relationships.
Searching across multiple paths (OPTIONAL and shortestpath)
The previous lectures covered searches that were outlined in a single match clause definition. Cypher allows even more complex querying.
In this lecture, the queries focus on multi path queries (either in a single MATCH using commas or separate MATCH clauses) including the outer joins (OPTIONAL MATCH) and NULL handling. For arbitrary distance between nodes, the variable depth queries syntax is explained and the shortest path extracted from the matching paths.
Involved nodes and relationships and the length of the paths are extracted using functions.
Creating content (CREATE)
On top of the loaded data from the Movie Dataset, some content is to be added. CREATE clause has the same syntax as the MATCH one though RETURN is optional on a creation query.
In this lecture, the CREATE clause is used for data insert for isolated node creation along properties as well as relationship creation based on defined matching variables. The has to be special care on creation upon a MATCH clause for not duplicates creation.
Updating content (SET and MERGE and ON CREATE | MATCH)
Updating the nodes and relationships in a database is day to day business. For so there are two main clauses that are useful: SET and MERGE.
In this lecture, the SET clause is used to assign properties on nodes and relationships whether they exist or are tobe created. for either create a property on a node or relationship or update the content of it. MERGE clause works in a different manner. Merge tries to fetch the passed entities. If not found, creates the new entites with the provided details. ON CREATE and ON MATCH clauses can follow to define further actions based on the MERGE execution. Examples on these two clauses are performed followed with a SET clause.
There are no restrictions on label or property creation nor the database schema has to be modified upfront which is a huge advantage in terms of flexibility compared to relational databases.
Deleting content (DETACH and DELETE)
On a dynamic environment, sometimes nodes and relationships have to be deleted be it because those entities are no longer needed or even due to data cleansing.
In this lecture, the DELETE and DETACH DELETE are compared. Deletion is performed by the DELETE clause though that is strictly deletion of an unrelated node or relationship. When dealing with related nodes, it has to come with the DETACH clause upfront as otherwise an error will be thrown back. DETACH unrelates the provided nodes for deletion so that DELETE can wipe them off. A workaround is to provide to the DELETE clause the relationship variables listed before the node ones for an errorless execution.
Extremely care is required as there is no undo for the DELETE clause.
Walk your own path
Rounding up the course, some useful concepts and insights are shared as a taste for what could be the next steps for continue growing.
Some open questions are thrown in this lecture to trigger this interest with regards data modeling and complex executions. Some clauses and functions are used on queries even involving procedures from the APOC and the ALGO libraries.