Skip to content

Commit dd2c1ea

Browse files
authored
Merge pull request #128 from pickwicksoft/feature/update-readme-for-better-onboarding
docs: Revise README for improved clarity and installation info
2 parents 8560dd6 + b86b1cd commit dd2c1ea

1 file changed

Lines changed: 53 additions & 242 deletions

File tree

README.md

Lines changed: 53 additions & 242 deletions
Original file line numberDiff line numberDiff line change
@@ -12,23 +12,7 @@
1212
<a href="https://pypi.org/project/streams-py/"><img alt="PyPI" src="https://img.shields.io/pypi/v/streams.py"></a>
1313
</p>
1414

15-
PyStreamAPI is a Python stream library that draws inspiration from the Java Stream API.
16-
Although it closely mirrors the Java API, PyStreamAPI adds some innovative features to make streams in Python even more
17-
innovative, declarative and easy to use.
18-
19-
PyStreamAPI offers both sequential and parallel streams and utilizes lazy execution.
20-
21-
Now you might be wondering why another library when there are already a few implementations? Well, here are a few advantages of this particular implementation:
22-
23-
* It provides both sequential and parallel streams.
24-
* Lazy execution is supported, enhancing performance.
25-
* It boasts high speed and efficiency.
26-
* The implementation achieves 100% test coverage.
27-
* It follows Pythonic principles, resulting in clean and readable code.
28-
* It adds some cool innovative features such as conditions or error handling and an even more declarative look.
29-
* It provides loaders for various data sources such as CSV, JSON, XML and YAML files.
30-
31-
Let's take a look at a small example:
15+
PyStreamAPI is a Python stream library inspired by the Java Stream API, adding Pythonic features for clean, declarative, and efficient data processing. It supports both sequential and parallel streams with lazy evaluation.
3216

3317
```python
3418
from pystreamapi import Stream
@@ -43,275 +27,102 @@ Stream.of([" ", '3', None, "2", 1, ""]) \
4327
.for_each(print) # Output: 1 2 3
4428
```
4529

46-
And here's the equivalent code in Java:
47-
48-
```java
49-
Object[] words = { " ", '3', null, "2", 1, "" };
50-
Arrays.stream( words )
51-
.filter( Objects::nonNull )
52-
.map( Objects::toString )
53-
.map( String::trim )
54-
.filter( s -> ! s.isEmpty() )
55-
.map( Integer::parseInt )
56-
.sorted()
57-
.forEach( System.out::println ); // Output: 1 2 3
58-
```
59-
60-
## What is a Stream?
61-
62-
A `Stream` is a powerful abstraction for processing sequences of data in a functional and declarative manner. It enables efficient and concise data manipulation and transformation.
63-
64-
Similar to its counterparts in Java and Kotlin, a Stream represents a pipeline of operations that can be applied to a collection or any iterable data source. It allows developers to express complex data processing logic using a combination of high-level operations, promoting code reusability and readability.
65-
66-
With Streams, you can perform a wide range of operations on your data, such as filtering elements, transforming values, aggregating results, sorting, and more. These operations can be seamlessly chained together to form a processing pipeline, where each operation processes the data and passes it on to the next operation.
67-
68-
One of the key benefits of Stream is lazy evaluation. This means that the operations are executed only when the result is actually needed, optimizing resource usage and enabling efficient processing of large or infinite datasets.
69-
70-
Furthermore, Stream supports both sequential and parallel execution. This allows you to leverage parallel processing capabilities when dealing with computationally intensive tasks or large amounts of data, significantly improving performance.
71-
72-
`pystreamapi.Stream` represents a stream that facilitates the execution of one or more operations. Stream operations can be categorized as either intermediate or terminal.
73-
74-
Terminal operations return a result of a specific type, while intermediate operations return the stream itself, enabling method chaining for multi-step operations.
75-
76-
Let's examine an example using Stream:
77-
78-
```python
79-
Stream.of([" ", '3', None, "2", 1, ""]) \
80-
.filter(lambda x: x is not None) \ # Intermediate operation
81-
.map(str) \ # Intermediate operation
82-
.map(lambda x: x.strip()) \ # Intermediate operation
83-
.filter(lambda x: len(x) > 0) \ # Intermediate operation
84-
.map(int) \ # Intermediate operation
85-
.sorted() \ # Intermediate operation
86-
.for_each(print) # Terminal Operation (Output: 1 2 3)
87-
```
88-
89-
Operations can be performed on a stream either in parallel or sequentially. A parallel stream executes operations concurrently, while a sequential stream processes operations in order.
90-
91-
Considering the above characteristics, a stream can be defined as follows:
92-
93-
* It is not a data structure itself but operates on existing data structures.
94-
* It does not provide indexed access like traditional collections.
95-
* It is designed to work seamlessly with lambda functions, enabling concise and expressive code.
96-
* It facilitates easy aggregation of results into lists, tuples, or sets.
97-
* It can be parallelized, allowing for concurrent execution of operations to improve performance.
98-
* It employs lazy evaluation, executing operations only when necessary.
99-
100-
## Use conditions to speed up your workflow!
101-
102-
![Conditions](https://raw.githubusercontent.com/PickwickSoft/pystreamapi/main/assets/conditions.png)
103-
104-
Conditions provide a convenient means for performing logical operations within your Stream, such as using `filter()`, `take_while()`, `drop_while()`, and more. With PyStreamAPI, you have access to a staggering 111 diverse conditions that enable you to process various data types including strings, types, numbers, and dates. Additionally, PyStreamAPI offers a powerful combiner that allows you to effortlessly combine multiple conditions, facilitating the implementation of highly intricate pipelines.
105-
106-
## Error handling: Work with data that you don't know
107-
PyStreamAPI offers a powerful error handling mechanism that allows you to handle errors in a declarative manner. This is especially useful when working with data that you don't know.
108-
109-
PyStreamAPI offers three different error levels:
110-
- `ErrorLevel.RAISE`: This is the default error level. It will raise an exception if an error occurs.
111-
- `ErrorLevel.IGNORE`: This error level will ignore any errors that occur and won't inform you.
112-
- `ErrorLevel.WARN`: This error level will warn you about any errors that occur and logs them as a warning with default logger.
113-
114-
115-
This is how you can use them:
116-
117-
```python
118-
from pystreamapi import Stream, ErrorLevel
119-
120-
Stream.of([" ", '3', None, "2", 1, ""]) \
121-
.error_level(ErrorLevel.IGNORE) \
122-
.map_to_int() \
123-
.error_level(ErrorLevel.RAISE) \
124-
.sorted() \
125-
.for_each(print) # Output: 1 2 3
126-
```
127-
128-
The code above will ignore all errors that occur during mapping to int and will just skip the elements.
129-
130-
For more details on how to use error handling, please refer to the documentation.
131-
132-
## Get started: Installation
133-
134-
To start using PyStreamAPI just install the module with this command:
30+
## Installation
13531

13632
```bash
137-
pip install streams.py
33+
pip install streams.py
13834
```
13935

140-
Afterward, you can import it with:
141-
14236
```python
14337
from pystreamapi import Stream
14438
```
14539

14640
:tada: PyStreamAPI is now ready to process your data
14741

148-
## Build a new Stream
42+
## Why PyStreamAPI?
14943

150-
PyStreamAPI offers two types of Streams, both of which are available in either sequential or parallel versions:
44+
* Sequential and parallel streams out of the box
45+
* Lazy execution for efficient processing of large datasets
46+
* 100% test coverage
47+
* Pythonic API — clean, readable, and expressive
48+
* 111+ built-in [conditions](https://pystreamapi.pickwicksoft.org/reference/conditions) for filtering and matching
49+
* Declarative [error handling](https://pystreamapi.pickwicksoft.org/reference/api-reference/error-handling) with configurable error levels
50+
* Built-in loaders for CSV, JSON, XML, YAML and TOML files
15151

152-
- (Normal) `Stream`: Offers operations that do not depend on the types. The same functionality as Streams in other programming languages.
52+
## Building a Stream
15353

154-
- `NumericStream`: This stream extends the capabilities of the default stream by
155-
introducing numerical operations. It is designed specifically for use
156-
with numerical data sources and can only be applied to such data.
157-
158-
There are a few factory methods that create new Streams:
54+
PyStreamAPI provides two stream types — `Stream` (general-purpose) and `NumericStream` (for numerical data with statistics) — each available in sequential and parallel flavors.
15955

16056
```python
161-
Stream.of([1, 2, 3]) # Can return a sequential or a parallel stream
57+
Stream.of([1, 2, 3]) # auto-selects sequential or numeric
58+
Stream.parallel_of([1, 2, 3]) # parallel stream
59+
Stream.sequential_of([1, 2, 3]) # sequential stream
60+
Stream.of_noneable(None) # returns empty stream when source is None
61+
Stream.iterate(0, lambda n: n + 2) # infinite stream (use .limit())
62+
Stream.concat(Stream.of([1, 2]), Stream.of([3, 4])) # merge streams
16263
```
16364

164-
Using the `of()` method will let the implementation decide which `Stream` to use. If the source is numerical, a `NumericStream` is created.
165-
166-
> **Note**
167-
>
168-
> Currently, it always returns a `SequentialStream` or a `SequentialNumericStream`
65+
For the full API reference see the [docs](https://pystreamapi.pickwicksoft.org/quick-start).
16966

170-
---
67+
## Conditions
17168

172-
```python
173-
Stream.parallel_of([1, 2, 3]) # Returns a parallel stream (Either normal or numeric)
174-
```
175-
176-
---
177-
178-
```python
179-
Stream.sequential_of([1, 2, 3]) # Returns a sequential stream (Either normal or numeric)
180-
```
69+
![Conditions](https://raw.githubusercontent.com/PickwickSoft/pystreamapi/main/assets/conditions.png)
18170

182-
---
71+
Over 111 ready-to-use conditions across strings, numbers, types, and dates — combine them freely with `one_of()`:
18372

18473
```python
185-
# Can return a sequential or a parallel stream (Either normal or numeric)
186-
Stream.of_noneable([1, 2, 3])
74+
from pystreamapi import Stream
75+
from pystreamapi.conditions import prime, even, one_of
18776

188-
# Returns a sequential or a parallel, empty stream (Either normal or numeric)
189-
Stream.of_noneable(None)
77+
Stream.of([1, 2, 3, 4, 5]) \
78+
.filter(one_of(even(), prime())) \
79+
.for_each(print) # 2, 3, 4, 5
19080
```
19181

192-
If the source is `None`, you get an empty `Stream`
82+
## Error Handling
19383

194-
---
84+
Control error behavior per-operation with `error_level()`:
19585

19686
```python
197-
Stream.iterate(0, lambda n: n + 2)
198-
```
199-
200-
Creates a Stream of an infinite Iterator created by iterative application of a
201-
function f to an initial element seed, producing a Stream consisting of seed,
202-
f(seed), f(f(seed)), etc.
203-
204-
> **Note**
205-
> Do not forget to limit the stream with `.limit()`
206-
207-
---
87+
from pystreamapi import Stream, ErrorLevel
20888

209-
```python
210-
Stream.concat(Stream.of([1, 2]), Stream.of([3, 4]))
211-
# Like Stream.of([1, 2, 3, 4])
89+
Stream.of([" ", '3', None, "2", 1, ""]) \
90+
.error_level(ErrorLevel.IGNORE) \
91+
.map_to_int() \
92+
.sorted() \
93+
.for_each(print) # Output: 1 2 3
21294
```
21395

214-
Creates a new Stream from multiple Streams. Order doesn't change.
96+
Available levels: `RAISE` (default), `IGNORE`, `WARN`. See the [error handling docs](https://pystreamapi.pickwicksoft.org/reference/api-reference/error-handling) for details.
21597

216-
## Use loaders: Load data from CSV, JSON, XML and YAML files in just one line
98+
## Data Loaders
21799

218-
PyStreamAPI offers a convenient way to load data from CSV, JSON, XML and YAML files. Like that you can start processing your
219-
files right away without having to worry about reading and parsing the files.
100+
Load data from files directly into a stream — no manual parsing needed:
220101

221-
You can import the loaders with:
102+
| Loader | Extra required | Description |
103+
|--------|----------------|-------------|
104+
| `csv` || CSV files with optional type casting and delimiter |
105+
| `json` | `[json_loader]` | JSON files or strings (streaming via ijson) |
106+
| `xml` | `[xml_loader]` | XML files or strings with node path access |
107+
| `yaml` || YAML files or strings |
108+
| `toml` || TOML files or strings |
222109

223110
```python
224-
from pystreamapi.loaders import csv, json, xml, yaml
225-
```
226-
Now you can use the loaders directly when creating your Stream:
227-
228-
For CSV:
111+
from pystreamapi import Stream
112+
from pystreamapi.loaders import csv
229113

230-
```python
231114
Stream.of(csv("data.csv", delimiter=";")) \
232-
.map(lambda x: x.attr1) \
115+
.map(lambda x: x.name) \
233116
.for_each(print)
234117
```
235118

236-
For JSON:
237-
```python
238-
Stream.of(json("data.json")) \
239-
.map(lambda x: x.attr1) \
240-
.for_each(print)
241-
```
242-
243-
You can access the attributes of the data structures directly like you would do with a normal object.
244-
245-
For XML:
119+
Install all optional extras at once: `pip install 'streams.py[all]'`
246120

247-
In order to use the XML loader, you need to install the optional xml dependency:
248-
249-
```bash
250-
pip install streams.py[xml_loader]
251-
```
252-
253-
Afterward, you can use the XML loader like this:
254-
255-
```python
256-
Stream.of(xml("data.xml")) \
257-
.map(lambda x: x.attr1) \
258-
.for_each(print)
259-
```
260-
261-
The access to the attributes is using a node path syntax. For more details on how to use the node path syntax, please
262-
refer to the [documentation](https://pystreamapi.pickwicksoft.org/reference/data-loaders).
263-
264-
For YAML:
265-
266-
In order to use the YAML loader, you need to install the optional yaml dependency:
267-
268-
```bash
269-
pip install streams.py[yaml_loader]
270-
```
271-
272-
Afterward, you can use the YAML loader like this:
273-
274-
```python
275-
Stream.of(yaml("data.yaml")) \
276-
.map(lambda x: x.attr1) \
277-
.for_each(print)
278-
```
279-
280-
## API Reference
281-
For a more detailed documentation view the docs on GitBook: [PyStreamAPI Docs](https://pystreamapi.pickwicksoft.org/)
282-
283-
## Complex Examples
284-
285-
#### Get all numbers from list of different types. Use parallelization.
286-
287-
```python
288-
Stream.parallel_of([" ", '3', None, "2", 1, ""]) \
289-
.filter(lambda x: x is not None) \
290-
.map(str) \
291-
.map(lambda x: x.strip()) \
292-
.filter(lambda x: len(x) > 0) \
293-
.map(int) \
294-
.sorted()\
295-
.for_each(print) # 1 2 3
296-
```
297-
298-
#### Generate a Stream of 10 Fibonacci numbers
299-
300-
```python
301-
def fib():
302-
a, b = 0, 1
303-
while True:
304-
yield a
305-
a, b = b, a + b
306-
307-
Stream.of(fib()) \
308-
.limit(10) \
309-
.for_each(print) # 0 1 1 2 3 5 8 13 21 34
310-
```
121+
See the [data loaders docs](https://pystreamapi.pickwicksoft.org/reference/data-loaders) for full usage.
311122

312-
## Performance
123+
## Documentation
313124

314-
Note that parallel Streams are not always faster than sequential Streams. Especially when the number of elements is small, we can expect sequential Streams to be faster.
125+
Full documentation: [pystreamapi.pickwicksoft.org](https://pystreamapi.pickwicksoft.org/)
315126

316127
## Bug Reports
317128

0 commit comments

Comments
 (0)