-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathbenchmark-overview.tex
More file actions
116 lines (95 loc) · 7.02 KB
/
benchmark-overview.tex
File metadata and controls
116 lines (95 loc) · 7.02 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
\chapter{Benchmark Overview}
\label{sec:benchmark-overview}
\section{Practice basis}
The task force members design \ldbcfinbench with their rich practical experience in
financial industry based on a comprehensive survey of financial scenarios including
Risk Control, AML (Anti-Money Laundering), KYC (Know Your Customer), Stock Recommendation
and so on.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Design Concepts}
\ldbcfinbench is intended to be a credible, fair and representative benchmark.
It's designed with the following concepts:
\begin{itemize}
\item \textbf{Based on real systems}. The task force members gathering
together from industry and academia intend to design \ldbcfinbench
to express and emphasize the special patterns of data and workload
distinguished from other popular benchmarks. To do that,
\ldbcfinbench is designed based on the rich practical experience of
members and additional surveys.
\item \textbf{Comprehensive and complete.} \ldbcfinbench is intended to
cover most demands encountered in the management of complexly
structured data in financial scenarios.
\item \textbf{Challenging and instructive.} Benchmarks are known to direct
product development in certain directions. \ldbcfinbench is informed
by state-of-the-art in database research and industry practice
to offer optimization challenges.
\item \textbf{Easy to use and extendable.} As a benchmark offering value
to many relevant stakeholders, \ldbcfinbench is designed to be easy
to use. The effort for obtaining test results with it should be
small.
\item \textbf{Modularized.} \ldbcfinbench is broken into parts both in
design and benchmark suite that can be individually addressed to
stimulate innovation without imposing an overly high threshold for
participation.
\item \textbf{Reproducible and documented.} \ldbcfinbench is intended to
specify the auditing rules and provide full disclosure reports of
auditing of benchmark runs in accordance with the LDBC
Bylaws~\cite{ldbc_byelaws}.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{New features in FinBench}
\ldbcsnb, one of the earlier LDBC benchmarks, is modeled around the operation of a real social network site. It defines a data schema that represents a realistic social network including people and their activities during a period of time and also the workloads mimic the different usage scenarios found in operating a real social network site. Compared with \ldbcsnb, \ldbcfinbench is characterized by the special features and patterns of the data schema and queries that represent the characteristics of financial scenarios.
\subsection{Data Schema}
The data schema for \ldbcfinbench is designed to reflect the real data in the financial systems. Frequent
financial entities in real systems include accounts, medium, persons, companies, loans, etc. The
entities are vertices in the data schema while the edges reflect financial activities, \eg fund transferred
from one account to another. In their data schema, financial scenarios have these distinguished characteristics
compared to regular social networks.
\begin{itemize}
\item Multiple edges can exist between two vertices, \eg Many transfer records exist between two accounts
\item Dynamic attribute exists in vertex to mark entities status, \eg an account is marked as blocked
\item Quantity attribute exists in edge, \eg Transfer edge has quantity attribute amount
\end{itemize}
The designed data schema is specified in \autoref{sec:data-definition}.
\subsection{Workloads}
In workloads and queries, financial scenarios have these distinguished characteristics.
\begin{itemize}
\item More tight latency, \eg some queries need to return in less than 100ms.
\item Write operations updating attributes, \eg marking an account as blocked.
\item Recursive Path Filtering. Some queries filter data with backward dependency
in variable-length paths, \eg finding all transfer paths A-[${e_1}$]->..-[${e_k}$]->B
where the timestamp of each transfer edge ${e_i}$ in the path is larger than that of
the previous ${e_{i-1}}$. In this pattern, the variable length path is qualified by
the edge quantity attributes or the aggregation in the path, either along one path
or a set of paths.
\item Read-write Query, which is a query sequence with a mix of reads and writes reflecting the
complexity of financial systems. Read-write query describes a desired pattern that risk control
policies are checked, and corresponding actions are taken before financial activities like
transfers are written down to storage. See \autoref{sec:read-write-query} for details.
\item Truncation. In financial scenarios, the degree of hub vertex may reach million and even
billion scales, especially when traversing on a graph. To handle the discordance between the tight
latency requirements and power-law distribution of data in the system, truncation is introduced
to reduce the complexity of queries. See \autoref{sec:truncation-on-hub-vertices} for details.
\end{itemize}
In \ldbcfinbench, there are two kinds of workloads:
\begin{itemize}
\item Transaction Workload. It includes queries with a tight latency bound, which are usually
queries hopping a few steps from a start vertice. There are complex reads, simple reads, write
operations, and read-write queries in transaction workload. The Transaction Workload is specified
in \autoref{sec:transaction-workload}.
\item Analytics Workload. It is supposed to include more complicated queries, \eg triggers and pre-computed
values in online systems. This part is future work that will be designed and discussed in the
following versions. The Analytics Workload is specified in \autoref{sec:analytics-workload}.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Benchmark Workflow}
See \autoref{sec:auditing-rules} for the execution workflow of \ldbcfinbench.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%