You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyStreamAPI is a Python stream library that draws inspiration from the Java Stream API.
16
-
Although it closely mirrors the Java API, PyStreamAPI adds some innovative features to make streams in Python even more
17
-
innovative, declarative and easy to use.
18
-
19
-
PyStreamAPI offers both sequential and parallel streams and utilizes lazy execution.
20
-
21
-
Now you might be wondering why another library when there are already a few implementations? Well, here are a few advantages of this particular implementation:
22
-
23
-
* It provides both sequential and parallel streams.
24
-
* Lazy execution is supported, enhancing performance.
25
-
* It boasts high speed and efficiency.
26
-
* The implementation achieves 100% test coverage.
27
-
* It follows Pythonic principles, resulting in clean and readable code.
28
-
* It adds some cool innovative features such as conditions or error handling and an even more declarative look.
29
-
* It provides loaders for various data sources such as CSV, JSON, XML and YAML files.
30
-
31
-
Let's take a look at a small example:
15
+
PyStreamAPI is a Python stream library inspired by the Java Stream API, adding Pythonic features for clean, declarative, and efficient data processing. It supports both sequential and parallel streams with lazy evaluation.
A `Stream` is a powerful abstraction for processing sequences of data in a functional and declarative manner. It enables efficient and concise data manipulation and transformation.
63
-
64
-
Similar to its counterparts in Java and Kotlin, a Stream represents a pipeline of operations that can be applied to a collection or any iterable data source. It allows developers to express complex data processing logic using a combination of high-level operations, promoting code reusability and readability.
65
-
66
-
With Streams, you can perform a wide range of operations on your data, such as filtering elements, transforming values, aggregating results, sorting, and more. These operations can be seamlessly chained together to form a processing pipeline, where each operation processes the data and passes it on to the next operation.
67
-
68
-
One of the key benefits of Stream is lazy evaluation. This means that the operations are executed only when the result is actually needed, optimizing resource usage and enabling efficient processing of large or infinite datasets.
69
-
70
-
Furthermore, Stream supports both sequential and parallel execution. This allows you to leverage parallel processing capabilities when dealing with computationally intensive tasks or large amounts of data, significantly improving performance.
71
-
72
-
`pystreamapi.Stream` represents a stream that facilitates the execution of one or more operations. Stream operations can be categorized as either intermediate or terminal.
73
-
74
-
Terminal operations return a result of a specific type, while intermediate operations return the stream itself, enabling method chaining for multi-step operations.
75
-
76
-
Let's examine an example using Stream:
77
-
78
-
```python
79
-
Stream.of(["", '3', None, "2", 1, ""]) \
80
-
.filter(lambdax: x isnotNone) \ # Intermediate operation
Operations can be performed on a stream either in parallel or sequentially. A parallel stream executes operations concurrently, while a sequential stream processes operations in order.
90
-
91
-
Considering the above characteristics, a stream can be defined as follows:
92
-
93
-
* It is not a data structure itself but operates on existing data structures.
94
-
* It does not provide indexed access like traditional collections.
95
-
* It is designed to work seamlessly with lambda functions, enabling concise and expressive code.
96
-
* It facilitates easy aggregation of results into lists, tuples, or sets.
97
-
* It can be parallelized, allowing for concurrent execution of operations to improve performance.
98
-
* It employs lazy evaluation, executing operations only when necessary.
Conditions provide a convenient means for performing logical operations within your Stream, such as using `filter()`, `take_while()`, `drop_while()`, and more. With PyStreamAPI, you have access to a staggering 111 diverse conditions that enable you to process various data types including strings, types, numbers, and dates. Additionally, PyStreamAPI offers a powerful combiner that allows you to effortlessly combine multiple conditions, facilitating the implementation of highly intricate pipelines.
105
-
106
-
## Error handling: Work with data that you don't know
107
-
PyStreamAPI offers a powerful error handling mechanism that allows you to handle errors in a declarative manner. This is especially useful when working with data that you don't know.
108
-
109
-
PyStreamAPI offers three different error levels:
110
-
-`ErrorLevel.RAISE`: This is the default error level. It will raise an exception if an error occurs.
111
-
-`ErrorLevel.IGNORE`: This error level will ignore any errors that occur and won't inform you.
112
-
-`ErrorLevel.WARN`: This error level will warn you about any errors that occur and logs them as a warning with default logger.
113
-
114
-
115
-
This is how you can use them:
116
-
117
-
```python
118
-
from pystreamapi import Stream, ErrorLevel
119
-
120
-
Stream.of(["", '3', None, "2", 1, ""]) \
121
-
.error_level(ErrorLevel.IGNORE) \
122
-
.map_to_int() \
123
-
.error_level(ErrorLevel.RAISE) \
124
-
.sorted() \
125
-
.for_each(print) # Output: 1 2 3
126
-
```
127
-
128
-
The code above will ignore all errors that occur during mapping to int and will just skip the elements.
129
-
130
-
For more details on how to use error handling, please refer to the documentation.
131
-
132
-
## Get started: Installation
133
-
134
-
To start using PyStreamAPI just install the module with this command:
30
+
## Installation
135
31
136
32
```bash
137
-
pip install streams.py
33
+
pip install streams.py
138
34
```
139
35
140
-
Afterward, you can import it with:
141
-
142
36
```python
143
37
from pystreamapi import Stream
144
38
```
145
39
146
40
:tada: PyStreamAPI is now ready to process your data
147
41
148
-
## Build a new Stream
42
+
## Why PyStreamAPI?
149
43
150
-
PyStreamAPI offers two types of Streams, both of which are available in either sequential or parallel versions:
44
+
* Sequential and parallel streams out of the box
45
+
* Lazy execution for efficient processing of large datasets
46
+
* 100% test coverage
47
+
* Pythonic API — clean, readable, and expressive
48
+
* 111+ built-in [conditions](https://pystreamapi.pickwicksoft.org/reference/conditions) for filtering and matching
49
+
* Declarative [error handling](https://pystreamapi.pickwicksoft.org/reference/api-reference/error-handling) with configurable error levels
50
+
* Built-in loaders for CSV, JSON, XML, YAML and TOML files
151
51
152
-
- (Normal) `Stream`: Offers operations that do not depend on the types. The same functionality as Streams in other programming languages.
52
+
## Building a Stream
153
53
154
-
-`NumericStream`: This stream extends the capabilities of the default stream by
155
-
introducing numerical operations. It is designed specifically for use
156
-
with numerical data sources and can only be applied to such data.
157
-
158
-
There are a few factory methods that create new Streams:
54
+
PyStreamAPI provides two stream types — `Stream` (general-purpose) and `NumericStream` (for numerical data with statistics) — each available in sequential and parallel flavors.
159
55
160
56
```python
161
-
Stream.of([1, 2, 3]) # Can return a sequential or a parallel stream
57
+
Stream.of([1, 2, 3]) # auto-selects sequential or numeric
Creates a new Stream from multiple Streams. Order doesn't change.
96
+
Available levels: `RAISE` (default), `IGNORE`, `WARN`. See the [error handling docs](https://pystreamapi.pickwicksoft.org/reference/api-reference/error-handling) for details.
215
97
216
-
## Use loaders: Load data from CSV, JSON, XML and YAML files in just one line
98
+
## Data Loaders
217
99
218
-
PyStreamAPI offers a convenient way to load data from CSV, JSON, XML and YAML files. Like that you can start processing your
219
-
files right away without having to worry about reading and parsing the files.
100
+
Load data from files directly into a stream — no manual parsing needed:
220
101
221
-
You can import the loaders with:
102
+
| Loader | Extra required | Description |
103
+
|--------|----------------|-------------|
104
+
|`csv`| — | CSV files with optional type casting and delimiter |
105
+
|`json`|`[json_loader]`| JSON files or strings (streaming via ijson) |
106
+
|`xml`|`[xml_loader]`| XML files or strings with node path access |
107
+
|`yaml`| — | YAML files or strings |
108
+
|`toml`| — | TOML files or strings |
222
109
223
110
```python
224
-
from pystreamapi.loaders import csv, json, xml, yaml
225
-
```
226
-
Now you can use the loaders directly when creating your Stream:
227
-
228
-
For CSV:
111
+
from pystreamapi import Stream
112
+
from pystreamapi.loaders import csv
229
113
230
-
```python
231
114
Stream.of(csv("data.csv", delimiter=";")) \
232
-
.map(lambdax: x.attr1) \
115
+
.map(lambdax: x.name) \
233
116
.for_each(print)
234
117
```
235
118
236
-
For JSON:
237
-
```python
238
-
Stream.of(json("data.json")) \
239
-
.map(lambdax: x.attr1) \
240
-
.for_each(print)
241
-
```
242
-
243
-
You can access the attributes of the data structures directly like you would do with a normal object.
244
-
245
-
For XML:
119
+
Install all optional extras at once: `pip install 'streams.py[all]'`
246
120
247
-
In order to use the XML loader, you need to install the optional xml dependency:
248
-
249
-
```bash
250
-
pip install streams.py[xml_loader]
251
-
```
252
-
253
-
Afterward, you can use the XML loader like this:
254
-
255
-
```python
256
-
Stream.of(xml("data.xml")) \
257
-
.map(lambdax: x.attr1) \
258
-
.for_each(print)
259
-
```
260
-
261
-
The access to the attributes is using a node path syntax. For more details on how to use the node path syntax, please
262
-
refer to the [documentation](https://pystreamapi.pickwicksoft.org/reference/data-loaders).
263
-
264
-
For YAML:
265
-
266
-
In order to use the YAML loader, you need to install the optional yaml dependency:
267
-
268
-
```bash
269
-
pip install streams.py[yaml_loader]
270
-
```
271
-
272
-
Afterward, you can use the YAML loader like this:
273
-
274
-
```python
275
-
Stream.of(yaml("data.yaml")) \
276
-
.map(lambdax: x.attr1) \
277
-
.for_each(print)
278
-
```
279
-
280
-
## API Reference
281
-
For a more detailed documentation view the docs on GitBook: [PyStreamAPI Docs](https://pystreamapi.pickwicksoft.org/)
282
-
283
-
## Complex Examples
284
-
285
-
#### Get all numbers from list of different types. Use parallelization.
286
-
287
-
```python
288
-
Stream.parallel_of(["", '3', None, "2", 1, ""]) \
289
-
.filter(lambdax: x isnotNone) \
290
-
.map(str) \
291
-
.map(lambdax: x.strip()) \
292
-
.filter(lambdax: len(x) >0) \
293
-
.map(int) \
294
-
.sorted()\
295
-
.for_each(print) # 1 2 3
296
-
```
297
-
298
-
#### Generate a Stream of 10 Fibonacci numbers
299
-
300
-
```python
301
-
deffib():
302
-
a, b =0, 1
303
-
whileTrue:
304
-
yield a
305
-
a, b = b, a + b
306
-
307
-
Stream.of(fib()) \
308
-
.limit(10) \
309
-
.for_each(print) # 0 1 1 2 3 5 8 13 21 34
310
-
```
121
+
See the [data loaders docs](https://pystreamapi.pickwicksoft.org/reference/data-loaders) for full usage.
311
122
312
-
## Performance
123
+
## Documentation
313
124
314
-
Note that parallel Streams are not always faster than sequential Streams. Especially when the number of elements is small, we can expect sequential Streams to be faster.
125
+
Full documentation: [pystreamapi.pickwicksoft.org](https://pystreamapi.pickwicksoft.org/)
0 commit comments