Skip to content

Commit 513c88f

Browse files
committed
updated design reviews & fixed several broken links
updated design review section, and fixed a number of broken links on serveral pages. Signed-off-by: Scott McCarthy <scott.mccarthy@opencastsoftware.com>
1 parent 12f1e36 commit 513c88f

14 files changed

Lines changed: 462 additions & 80 deletions

.markdownlint.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ default: true
33
blank_lines: false
44
bullet: false
55
html: false
6-
indentation: false
6+
indentation: true
77
line_length: false
88
spaces: false
99
url: false

docs/best-practices.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,15 @@ author: Scott McCarthy
55
date: 18/08/2022
66
---
77

8-
From a 'Best-Practice' point of view , this section of the site is structured something similar to Microsofts FOUR pillars of DevOps:
8+
From a 'Best-Practice' point of view , this section of the site is structured something similar to Microsofts FOUR pillars of DevOps, with the additional areas of interest such as Quality, Security, Collaboration & Improvement:
99

1010
![DevOps Practice Overview](img/devops-practices-overview.png)
1111

1212
Below you can learn more about each particular DevOps best practice.
1313

14-
- [Continuous Planning](continuous-planning/planning-overview.md)
15-
16-
- [Agile Development](continuous-planning/agile-development/agile-overview.md)
17-
- Design Reviews - TBC
18-
14+
- [Continuous Planning](best-practices/continuous-planning.md)
15+
- [Agile Development](best-practices/continuous-planning/agile-development.md)
16+
- [Design Reviews](best-practices/continuous-planning/design-reviews.md)
1917
- Continuous Integration
2018
- Continuous Delivery
2119
- Continuous Quality

docs/best-practices/continuous-planning/design-reviews.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ date: 22/08/2022
77

88
Designing software well is hard.
99

10-
CSE has collected a number of practices which we find help in the design process.
10+
Below are number of practices which I find help in the design process.
1111
This covers not only technical design of software, but also architecture design and non-functional requirements gathering for new projects.
1212

1313
## Goals
@@ -19,9 +19,5 @@ This covers not only technical design of software, but also architecture design
1919
## Sections
2020

2121
- [Design Patterns](design-reviews/design-patterns.md)
22-
- [Non-Functional Requirements Guidance](design-reviews/non-functional.md)
22+
- [Non-Functional Requirements Guidance](design-reviews/non-functional-requirements.md)
2323
- [Sustainable Software Engineering](design-reviews/sustainability.md)
24-
25-
## Recipes
26-
27-
- XXX [Design Recipes](design-reviews/recipes/README.md)
Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
1-
# Design Patterns
2-
3-
The design patterns section recommends patterns of software and architecture design.
4-
This section provides a curated list of commonly used patterns from trusted sources.
5-
Rather than duplicate or replace the cited sources, this section aims to compliment them with suggestions, guidance, and learnings based on firsthand experiences.
6-
7-
## Subsections
8-
9-
* [Data Heavy Design Guidance](data-heavy-design-guidance/README.md)
10-
* [Object Oriented Design Reference](object-oriented-design-reference/README.md)
11-
* [Distributed System Design Reference](distributed-system-design-reference/README.md)
12-
* [REST API Design Guidance](rest-api-design-guidance/README.md)
1+
---
2+
title: Design Patterns
3+
summary: A guide to Design Reviews
4+
author: Scott McCarthy
5+
date: 30/08/2022
6+
---
7+
8+
The design patterns section recommends patterns of software and architecture design.
9+
This section provides a curated list of commonly used patterns from trusted sources.
10+
Rather than duplicate or replace the cited sources, this section aims to compliment them with suggestions, guidance, and learnings based on firsthand experiences.
11+
12+
## Subsections
13+
14+
- [Data Heavy Design Guidance](design-patterns/data-heavy-design-guidance.md)
15+
- [Object Oriented Design Reference](design-patterns/object-oriented-design-reference.md)
16+
- [Distributed System Design Reference](design-patterns/distributed-system-design-reference.md)
17+
- [REST API Design Guidance](design-patterns/rest-api-design-guidance.md)
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Data and DataOps Fundamentals
2+
3+
Most projects involve some type of data storage, data processing and data ops. For these projects, as with all projects, we follow the general guidelines laid out in other sections around security, testing, observability, CI/CD etc.
4+
5+
## Goal
6+
7+
The goal of this section is to briefly describe how to apply the fundamentals to data heavy projects or portions of the project.
8+
9+
## Isolation
10+
11+
Please be cautious of which [isolation levels](https://en.wikipedia.org/wiki/Isolation_(database_systems)) you are using. Even with a database that offers serializability, it is possible that within a transaction or connection you are leveraging a lower isolation level than the database offers. In particular, read uncommitted (or eventual consistency), can have a lot of unpredictable side effects and introduce bugs that are difficult to reason about. Eventually consistent systems should be treated as a last resort for achieving your scalability requirements; batching, sharding, and caching are all recommended solutions to increase your scalability. If none of these options are tenable, consider evaluating the "New SQL" databases like CockroachDB or TiDB, before leveraging an option that relies on eventual consistency.
12+
13+
There are other levels of isolation, outside the isolation levels mentioned in the link above. Some of these have nuances different from the 4 main levels, and can be difficult to compare. Snapshot Isolation, strict serializability, "read your own writes", monotonic reads, bounded staleness, causal consistency, and linearizability are all other terms you can look into to learn more on the subject.
14+
15+
## Concurrency Control
16+
17+
Your systems should (almost) always leverage some form of concurrency control, to ensure correctness amongst competing requests and to prevent data races. The 2 forms of concurrency control are **pessimistic** and **optimistic**.
18+
19+
A **pessimistic** transaction involves a first request to "lock the data", and a second request to write the data. In between these requests, no other requests touching that data will succeed. See [2 Phase Locking](https://en.wikipedia.org/wiki/Two-phase_locking) (also often known as 2 Phase Commit) for more info.
20+
21+
The (more) recommended approach is **optimistic** concurrency, where a user can read the object at a specific version, and update the object if and only if it hasn't changed. This is typically done via the [Etag Header](https://en.wikipedia.org/wiki/HTTP_ETag).
22+
23+
A simple way to accomplish this on the database side is to increment a version number on each update. This can be done in a single executed statement as:
24+
25+
> WARNING: the below will not work when using an isolation level at or lower than read uncommitted (eventual consistency).
26+
27+
```SQL
28+
-- Please treat this as pseudo code, and adjust as necessary.
29+
30+
UPDATE <table_name>
31+
SET field1 = value1, ..., fieldN = valueN, version = $new_version
32+
WHERE ID = $id AND version = $version
33+
```
34+
35+
## Data Tiering (Data Quality)
36+
37+
Develop a common understanding of the quality of your datasets so that everyone understands the quality of the data, and expected use cases and limitations.
38+
39+
A common data quality model is `Bronze`, `Silver`, `Gold`
40+
41+
- **Bronze:** This is a landing area for your raw datasets with none or minimal data transformations applied, and therefore are optimized for writes / ingestion. Treat these datasets as an immutable, append only store.
42+
- **Silver:** These are cleansed, semi-processed datasets. These conform to a known schema and predefined data invariants and might have further data augmentation applied. These are typically used by data scientists.
43+
- **Gold:** These are highly processed, highly read-optimized datasets primarily for consumption of business users. Typically, these are structured in your standard fact and dimension tables.
44+
45+
Divide your data lake into three major areas containing your Bronze, Silver and Gold datasets.
46+
47+
> Note: Additional storage areas for malformed data, intermediate (sandbox) data, and libraries/packages/binaries are also useful when designing your storage organization.
48+
49+
## Data Validation
50+
51+
Validate data early in your pipeline
52+
53+
- Add data validation between the Bronze and Silver datasets. By validating early in your pipeline, you can ensure all datasets conform to a specific schema and known data invariants. This can also potentially prevent data pipeline failures in case of unexpected changes to the input data.
54+
- Data that does not pass this validation stage can be rerouted to a record store dedicated for malformed data for diagnostic purposes.
55+
- It may be tempting to add validation prior to landing in the Bronze area of your data lake. This is generally not recommended. Bronze datasets are there to ensure you have as close of a copy of the source system data. This can be used to replay the data pipeline for both testing (i.e. testing data validation logic) and data recovery purposes (i.e. data corruption is introduced due to a bug in the data transformation code and thus the pipeline needs to be replayed).
56+
57+
## Idempotent Data Pipelines
58+
59+
Make your data pipelines re-playable and idempotent
60+
61+
- Silver and Gold datasets can get corrupted due to a number of reasons such as unintended bugs, unexpected input data changes, and more. By making data pipelines re-playable and idempotent, you can recover from this state through deployment of code fixes, and re-playing the data pipelines.
62+
- Idempotency also ensures data-duplication is mitigated when replaying your data pipelines.
63+
64+
## Testing
65+
66+
Ensure data transformation code is testable
67+
68+
- Abstracting away data transformation code from data access code is key to ensuring unit tests can be written against data transformation logic. An example of this is moving transformation code from notebooks into packages.
69+
- While it is possible to run tests against notebooks, by extracting the code into packages, you increase the developer productivity by increasing the speed of the feedback cycle.
70+
71+
## CI/CD, Source Control and Code Reviews
72+
73+
- All artifacts needed to build the data pipeline from scratch should be in source control. This included infrastructure-as-code artifacts, database objects (schema definitions, functions, stored procedures etc.), reference/application data, data pipeline definitions and data validation and transformation logic.
74+
- Any new artifacts (code) introduced to the repository should be code reviewed, both automatically (linting, credential scanning etc.) and peer reviewed.
75+
- There should be a safe, repeatable process (CI/CD) to move the changes through dev, test and finally production.
76+
77+
## Security and Configuration
78+
79+
- Maintain a central, secure location for sensitive configuration such as database connection strings that can be accessed by the appropriate services within the specific environment.
80+
- On Azure this is typically solved through securing secrets in a Key Vault per environment, then having the relevant services query KeyVault for the configuration
81+
82+
## Observability
83+
84+
Monitor infrastructure, pipelines and data
85+
86+
- A proper monitoring solution should be in-place to ensure failures are identified, diagnosed and addressed in a timely manner. Aside from the base infrastructure and pipeline runs, data should also be monitored. A common area that should have data monitoring is the malformed record store.
87+
88+
## End to End and Azure Technology Samples
89+
90+
The [DataOps for the Modern Data Warehouse repo](https://github.com/Azure-Samples/modern-data-warehouse-dataops) contains both end-to-end and technology specific samples on how to implement DataOps on Azure.
91+
92+
![CI/CD](../../../../img/CI_CD_process.png)
93+
Image: CI/CD for Data pipelines on Azure - from DataOps for the Modern Data Warehouse repo
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: Distributed System Design Reference
3+
summary: A guide to Design Reviews
4+
author: Scott McCarthy
5+
date: 30/08/2022
6+
---
7+
8+
Distributed systems introduce new and interesting problems that need addressing.
9+
Software engineering as a field has dealt with these problems for years, and there are phenomenal resources available for reference when creating a new distributed system.
10+
Some that we recommend are as follows:
11+
12+
* [Martin Fowler's Patterns of Distributed Systems](https://martinfowler.com/articles/patterns-of-distributed-systems/)
13+
* [microservices.io](https://microservices.io/index.html)
14+
* [Azure's Cloud Design Patterns](https://docs.microsoft.com/en-us/azure/architecture/patterns/)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Micro-services
2+
3+
The micro-services architecture is a design approach to build a single application as a set of small services. Each service runs in its own process and communicates with other services through a well-defined interface using a lightweight mechanism, typically an HTTP-based application programming interface (API). Microservices are built around business capabilities; each service is scoped to a single purpose. You can use different frameworks or programming languages to write micro-services and deploy them independently, as a single service, or as a group of services.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Object-Oriented Design Reference
2+
3+
When writing software for large projects, the hardest part is often communication and maintenance.
4+
Following proven design patterns can optimize for maintenance, readability, and ease of extension.
5+
In particular, object-oriented design patterns are well-established in the industry.
6+
7+
Please refer to the following resources to create strong object-oriented designs:
8+
9+
* [Class Diagram Overview](http://www.agilemodeling.com/artifacts/classDiagram.htm)
10+
* [Design Patterns Wikipedia](https://en.wikipedia.org/wiki/Design_Patterns)
11+
* [Object Oriented Design Website](https://www.oodesign.com/)

0 commit comments

Comments
 (0)