@@ -64,7 +64,350 @@ static enum task_state scheduler_dp_ll_tick_dummy(void *data)
6464}
6565
6666/*
67- * function called after every LL tick
67+ * DP with Earliest Deadline First scheduling
68+ *
69+ * DP a.k.a. "Data processing" is an async scheduling method of data processing modules.
70+ * Each module works in a separate, preemptible thread with lower priority than LL thread.
71+ * It allows processing with periods longer than 1ms, on-demand processing, etc.
72+ *
73+ * Unlike in LL "low latency" method where a module started every 1ms cycle and all of LL modules
74+ * together MUST finish processing 1ms, DP works async and gets CPU when a module is "ready for
75+ * processing", what means:
76+ * - on each module's input buffer there's at least IBS bytes of data and in each module's output
77+ * buffer there's at least OBS bytes of free space
78+ * OR
79+ * - a module declared readiness by itself by an optional API call "is_ready_to_process"
80+ *
81+ * Critical part is that the module MUST finish processing before its DEADLINE. A deadline is
82+ * a time when the modules MUST provide a data chunk in order to keep next module(s) in the
83+ * pipeline working.
84+ *
85+ * To ensure that all modules provide data on time - as long as CPU is not overloaded - regardless
86+ * of modules' processing times and processing periods, a Earliest Deadline First (EDF) scheduling
87+ * is used.
88+ * https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling
89+ *
90+ * DEADLINE CALCULATIONS
91+ * Lets go from the beginning, there are some DEFINITIONS
92+ *
93+ * def: buffers' Latest Feeding Time (LFT)
94+ * LFT is the latest moment when >>a buffer<< must be fed with a portion of data allowing its
95+ * data consumer to work and finish in its specific time
96+ *
97+ * LFT is a parameter specific to a buffer and can be calculated based on:
98+ * - current amount of data in the buffer
99+ * - data reciever's consumption rate and period
100+ * - data source production rate and period
101+ * - data reciever's module's LST - latest start time
102+ *
103+ * so LFT in high level is sum of:
104+ * - Latest start time (LST) of the data consumer
105+ * LST is defined later
106+ *
107+ * - estimated time the consumer will drain the current data from the buffer
108+ * number_of_ms_in_buffer / consumer_period
109+ * i.e. if there's 5ms of data in the buffer and period of the consumer is
110+ * 2ms, the calculated time is 4ms
111+ *
112+ * - correction for multiple source cycle
113+ * in case the producer period < consumer period the LFT time needs to be corrected,
114+ * as the producer must process more than once to provide enough data.
115+ * The correction will be calculated as:
116+ * producer_LPT * required_number_of_cycles
117+ * where LPT is longest processing time, explained later
118+ *
119+ * correction = producer_LPT *
120+ * ((consumer_period - number_of_ms_in_buffer) / producer_period)
121+ * if correction is < 0, it should be set to zero
122+ * note that in case producer_period >= consumer_period correction is always 0
123+ *
124+ * finally:
125+ * LFT = LST(consumer) + estimatet_drain_time - correction
126+ *
127+ *
128+ * def: DP module's DEADLINE
129+ * a DEADLINE is the latest moment >>a module<< must finish processing to feed all target
130+ * buffers before their LFTs
131+ * Calculation is simple
132+ * - module's deadline is the nearest LFT of all target buffers
133+ *
134+ * def: DP module's Longest Processing Time (LPT)
135+ * LPT is the longest time the module may process a portion of data, assuming it is scheduled
136+ * 100% of CPU time
137+ * >> LPT cannot be measured in runtime << as processing may change from cycle to cycle, etc.
138+ * It can, however, be estimated based on:
139+ * - declared (by a module vendor) number of CPU cycles required for processing.
140+ * This declaration should be done separately for all combination of input/output data
141+ * formats, platform, CPU type, using of HiFi etc.
142+ * - If declaration is not available, we can take "a period" as an approximation of longest
143+ * possible processing time. "A period" is a value calculated using IBS and data consumption
144+ * rate of a module. A module cannot possibly processing longer than its period, because it
145+ * would never provide data in time (if LPT = period that means a module required 100% of
146+ * CPU for processing, so it is really the worst possible case)
147+ *
148+ * Example: if a data rate is 48samples/msec and OBS = 480samples, the "worst case" period
149+ * should be calculated 10ms
150+ *
151+ * NOTE: in case of sampling freq like 44.1 a round up should taken (like if freq was 45000)
152+ *
153+ * The approximation above, however a correct, is assuming that a module is a heavy one and
154+ * it requires 100% of CPU time. Using it may lead to unnecessary buffering, see
155+ * "delayed start" section below.
156+ *
157+ * def: DP module's latest start time (LST)
158+ * LST is the latest moment >>a module<< must begin processing in order to finish within
159+ * its deadline.
160+ * Calculation: deadline - LPT
161+ *
162+ *
163+ * >>>> Based on an above, it is clear that we do need to calculate first a deadline of the
164+ * very latest module in a chain, than go back and calculate LFTs and deadline
165+ * of each module separately <<<<
166+ *
167+ * Fortunate is that the last module of a pipeline is almost always an LL module (usually DAI).
168+ * TODO: how to proceed in If there's no LL at the end of pipeline (i.e. in case when the last
169+ * module is not producing samples - like speech recognition??
170+ *
171+ * note that in case if LL1 -> DP1 -> DP3 -> LL2 -> DP3 -> DP4 -> LL2
172+ * there are 2 separate deadline calculation chains: DP4 than DP3, and (independent) DP2 than DP1
173+ *
174+ * also note that deadlines and other parameters may change, so re-calculation of all parameters
175+ * should occur reasonable frequently and include all DP modules, regardless of a core it is
176+ * run on
177+
178+ * for LL module deadline always is "now", so it is very easy to calculate LFTs for
179+ * its input buffer(s)
180+ * note: in case of data rates like 44.1, which cannot be divided to 1ms, a round up to 45
181+ * should be used
182+ *
183+ * - LL module always start in 1ms periods
184+ * - LL module always consume constant number of bytes in a cycle (with an exception for
185+ * frequencies like 44.1, a round up 45KHz should be taken for calculations)
186+ * so LFT = number of data chunks in buffer * 1ms
187+ *
188+ *
189+ * NOTE!!! "NOW" in all of the calculations is "last start of LL scheduler". It makes all
190+ * calculations simpler, as in the examples below (calculating CPU cycles would require taking
191+ * extra care for 32bit overflows or use slow 64bit operations). Also all modules have the same
192+ * timestamp as "NOW", regardless of moment in the cycle the deadlines are calculated
193+ *
194+ * EXAMPLE1 (data source period is longer or equal to data source)
195+ * let's take a pipeline:
196+ * assuming
197+ * - the pipeline is in stable state (processing for a while, not in startup)
198+ * - no DP is currently processing
199+ * - whole CPU is dedicated to DP, like if LL is on core 0 and DPs on core 1
200+ *
201+ * LL1 ->(buf1, 100ms data) -> DP1 -> (buf2, 0ms data) -> DP2 -> (buf3, 18ms data) -> LL2
202+ * period 100ms period: 20ms
203+ * LPT: 5ms LPT: 10ms
204+ *
205+ * 1) 0ms time:
206+ * - DP1 is ready for processing
207+ * - DP2 is not ready for processing
208+ * calculate deadlines:
209+ * - buf3 LFT = 18ms ==> DP2 deadline = 18ms
210+ * - DP2 LST = DP2 deadline - DP2 LPT = 8ms
211+ * - buf2 LFT = 8ms ==> DP1 deadline = 8ms
212+ *
213+ * DP1 will be scheduled
214+ *
215+ * 2) 5ms time, DP1 has finished processing
216+ *
217+ * LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 100ms data) -> DP2 -> (buf3, 13ms data) -> LL2
218+ * period 100ms period: 20ms
219+ * LPT: 5ms LPT: 10ms
220+ *
221+ * - DP1 is not ready for processing
222+ * - DP2 is ready for processing
223+ * calculate deadlines:
224+ * - buf3 LFT = 13ms ==> DP2 deadline = 13ms
225+ * - DP2 LST = 3ms
226+ * - buf2 LFT = 5*20ms + 3 = 103ms ==> DP1 deadline = 103ms
227+ *
228+ * DP2 will be scheduled
229+ *
230+ *
231+ *
232+ *
233+ * EXAMPLE2 (data source period is shorter than data receiver)
234+ *
235+ * LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 18ms data) -> LL2
236+ * period 5ms period: 20ms
237+ * LPT: 2ms LPT: 10ms
238+ *
239+ * 1) 0ms time
240+ * - DP1 is ready for processing
241+ * - DP2 is ready for processing
242+ *
243+ * calculate deadlines:
244+ * - buf3 LFT = 18ms ==> DP2 deadline = 18ms
245+ * - DP2 LST = 8ms
246+ * - buf2 LFT = 5 + 8 = 13ms. correction for multiple source cycle is negative => 0
247+ * ==> DP1 deadline = 13ms
248+ *
249+ * DP1 gets CPU for 2 ms
250+ *
251+ * 2) 2ms time
252+ * LL1 ->(buf1, 2ms data) -> DP1 -> (buf2, 20ms data) -> DP2 -> (buf3, 16ms data) -> LL2
253+ * period 5ms period: 20ms
254+ * LPT: 2ms LPT: 10ms
255+ *
256+ * - DP1 is not ready for processing
257+ * - DP2 is ready for processing
258+ *
259+ * calculate deadlines:
260+ * - buf3 LFT = 16ms ==> DP2 deadline = 16ms
261+ * - DP2 LST = 6ms
262+ * - buf2 LFT = 20ms (1 period) + 6ms = 26ms
263+ * ==> DP1 deadline = 26ms
264+ *
265+ *
266+ * 3) 12ms time
267+ * LL1 ->(buf1, 12ms data) -> DP1 -> (buf2, 0ms data) -> DP2 -> (buf3, 24ms data) -> LL2
268+ * period 5ms period: 20ms
269+ * LPT: 2ms LPT: 10ms
270+ *
271+ * - DP1 is ready for processing
272+ * - DP2 is not ready for processing
273+ *
274+ * calculate deadlines:
275+ * - buf3 LFT = 24ms ==> DP2 deadline = 24ms
276+ * - DP2 LST = 14ms
277+ * - buf2 LFT = 14ms - 4*2 (4 periods * LPT) = 6ms
278+ * ==> DP1 deadline = 6ms
279+ *
280+ * DP1 gets CPU for 2ms
281+ *
282+ * 4) 14ms time
283+ * LL1 ->(buf1, 9ms data) -> DP1 -> (buf2, 5ms data) -> DP2 -> (buf3, 22ms data) -> LL2
284+ * period 5ms period: 20ms
285+ * LPT: 2ms LPT: 10ms
286+ *
287+ * - DP1 is ready for processing
288+ * - DP2 is not ready for processing
289+ *
290+ * calculate deadlines:
291+ * - buf3 LFT = 22ms ==> DP2 deadline = 22ms
292+ * - DP2 LST = 12ms
293+ * - buf2 LFT = 12ms - correction for multiple source cycle
294+ * correction for multiple source cycle = 3*2 (3 periods * LPT) = 6ms
295+ * - DP1 deadline = 12ms - 6ms = 6ms
296+ *
297+ * DP1 gets CPU for 2ms
298+ *
299+ * 5) 16ms time
300+ * LL1 ->(buf1, 6ms data) -> DP1 -> (buf2, 10ms data) -> DP2 -> (buf3, 20ms data) -> LL2
301+ * period 5ms period: 20ms
302+ * LPT: 2ms LPT: 10ms
303+ *
304+ * - DP1 is ready for processing
305+ * - DP2 is not ready for processing
306+ *
307+ * calculate deadlines:
308+ * - buf3 LFT = 20ms ==> DP2 deadline = 20ms
309+ * - DP2 LST = 10ms
310+ * - buf2 LFT = 10ms - 2*2 (2 periods * LPT) = 6ms
311+ * ==> DP1 deadline = 6ms
312+ *
313+ * DP1 gets CPU for 2ms
314+ *
315+ * 6) 18ms time
316+ * LL1 ->(buf1, 3ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 18ms data) -> LL2
317+ * period 5ms period: 20ms
318+ * LPT: 2ms LPT: 10ms
319+ *
320+ * - DP1 is not ready for processing
321+ * - DP2 is not ready for processing
322+ *
323+ * calculate deadlines - however pointless at when no DP is ready:
324+ * - buf3 LFT = 18ms ==> DP2 deadline = 18ms
325+ * - DP2 LST = 8ms
326+ * - buf2 LFT = 8ms - 1*2 (1 periods * LPT) = 6ms
327+ * ==> DP1 deadline = 6ms
328+ *
329+ * no DP is processing for 2 ms
330+ *
331+ * 7) 20ms time
332+ * LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 16ms data) -> LL2
333+ * period 5ms period: 20ms
334+ * LPT: 2ms LPT: 10ms
335+ *
336+ * - DP1 is ready for processing
337+ * - DP2 is not ready for processing
338+ *
339+ * calculate deadlines -
340+ * - buf3 LFT = 16ms ==> DP2 deadline = 16ms
341+ * - DP2 LST = 6ms
342+ * - buf2 LFT = 6ms - 1*2 (1 periods * LPT) = 4ms
343+ * ==> DP1 deadline = 4ms
344+ *
345+ * DP1 gets CPU for 2ms
346+ *
347+ * 8) 22ms time
348+ * LL1 ->(buf1, 2ms data) -> DP1 -> (buf2, 20ms data) -> DP2 -> (buf3, 14ms data) -> LL2
349+ * period 5ms period: 20ms
350+ * LPT: 2ms LPT: 10ms
351+ *
352+ * - DP1 is ready for processing
353+ * - DP2 is not ready for processing
354+ *
355+ * calculate deadlines -
356+ * - buf3 LFT = 14ms ==> DP2 deadline = 14ms
357+ * - DP2 LST = 4ms
358+ * - buf2 LFT = 4ms - 1*2 (1 periods * LPT) = 2ms
359+ * ==> DP1 deadline = 2ms
360+ *
361+ * DP1 gets CPU for 2ms
362+ *
363+ *
364+ * STARTUP
365+ * Special case is "pipeline startup". When a pipeline is starting, calculations make no sense,
366+ * as all the modules are already late and deadlines are in the past
367+ * To make startup possible DELAYED START mechanism needs to be introduced.
368+ *
369+ * Delayed start means that:
370+ * - when a DP module becomes ready for a first time, its deadline set to NOW
371+ * - even if DP module provides data early, the data will be hold in the buffer
372+ * till first LPT passes since DP become ready for the first time
373+ *
374+ * The purpose is that a DP may finish processing quicker than its longest processing time. It may
375+ * be caused by many events, usually by lower CPU load during pipeline startup. If next module
376+ * starts processing immediately in such situation, it may go into underrun when DP processing take
377+ * longer in the future. Delaying to declared/estimated LongestProcessingTime (LPT) prevents this,
378+ * as long as the processing cycles declaration is accurate.
379+ *
380+ * Delayed start makes EDF scheduling possible and ensures that even when CPU load close to 100%
381+ * every module have enough processing time to finish within its deadline.
382+ *
383+ *
384+ * DP SHEDULER
385+ * A list of all DP tasks, regardless on core the task is on, is to be iterated every time
386+ * the situation of DP readiness or deadline timing may change, that include
387+ * - finish of processing of LL pipeline (on any core)
388+ * - finish of processing of any DP module (on any core)
389+ * TODO
390+ *
391+ * during the iteration, the following will be checked:
392+ * - Readiness of each DP module
393+ * as mentioned before, module "is ready" when declared readiness by itself an API call
394+ * or when it has at least IBS of data on each input and at least OBS free space on each out
395+ *
396+ * - deadline calculation of each DP module
397+ * LFTs and Deadlines are not constant, they may change when a module consume/produce
398+ * a portion of data. Therefore all LFTs and Deadlines must be re-calculated
399+ *
400+ * EXAMPLE
401+ *
402+ * DP1 (20ms period) --(BUF1 10ms data)--> DP2 (10ms period) --(BUF2 3ms data)-->LL (1ms period)
403+ * 11ms LPT 1.5ms LPT
404+ *
405+ * current time: 0.1ms before LL cycle
406+ *
407+ * LFT of BUF2: buffer contains 3ms of data, data consumption period is 1ms, so
408+ * LPT(BUF2) = NOW+3.1ms
409+ *
410+ * deadline(DP2) = LFT(BUF2) - LPT(DP2) = NOW + 1.6ms
68411 *
69412 * This function checks if the queued DP tasks are ready to processing (meaning
70413 * the module run by the task has enough data at all sources and enough free space
@@ -73,10 +416,7 @@ static enum task_state scheduler_dp_ll_tick_dummy(void *data)
73416 * if the task becomes ready, a deadline is set allowing Zephyr to schedule threads
74417 * in right order
75418 *
76- * TODO: currently there's a limitation - DP module must be surrounded by LL modules.
77- * it simplifies algorithm - there's no need to browse through DP chains calculating
78- * deadlines for each module in function of all modules execution status.
79- * Now is simple - modules deadline is its start + tick time.
419+ * EDF scheduling example
80420 *
81421 * example:
82422 * Lets assume we do have a pipeline:
0 commit comments