Skip to content

Commit d2fa68b

Browse files
DP: description of deadline calculation algorithm
this commit adds a detailed doc of describing howto calculate a deadline for DP to DP connection Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
1 parent d677107 commit d2fa68b

1 file changed

Lines changed: 345 additions & 5 deletions

File tree

src/schedule/zephyr_dp_schedule.c

Lines changed: 345 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,350 @@ static enum task_state scheduler_dp_ll_tick_dummy(void *data)
6464
}
6565

6666
/*
67-
* function called after every LL tick
67+
* DP with Earliest Deadline First scheduling
68+
*
69+
* DP a.k.a. "Data processing" is an async scheduling method of data processing modules.
70+
* Each module works in a separate, preemptible thread with lower priority than LL thread.
71+
* It allows processing with periods longer than 1ms, on-demand processing, etc.
72+
*
73+
* Unlike in LL "low latency" method where a module started every 1ms cycle and all of LL modules
74+
* together MUST finish processing 1ms, DP works async and gets CPU when a module is "ready for
75+
* processing", what means:
76+
* - on each module's input buffer there's at least IBS bytes of data and in each module's output
77+
* buffer there's at least OBS bytes of free space
78+
* OR
79+
* - a module declared readiness by itself by an optional API call "is_ready_to_process"
80+
*
81+
* Critical part is that the module MUST finish processing before its DEADLINE. A deadline is
82+
* a time when the modules MUST provide a data chunk in order to keep next module(s) in the
83+
* pipeline working.
84+
*
85+
* To ensure that all modules provide data on time - as long as CPU is not overloaded - regardless
86+
* of modules' processing times and processing periods, a Earliest Deadline First (EDF) scheduling
87+
* is used.
88+
* https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling
89+
*
90+
* DEADLINE CALCULATIONS
91+
* Lets go from the beginning, there are some DEFINITIONS
92+
*
93+
* def: buffers' Latest Feeding Time (LFT)
94+
* LFT is the latest moment when >>a buffer<< must be fed with a portion of data allowing its
95+
* data consumer to work and finish in its specific time
96+
*
97+
* LFT is a parameter specific to a buffer and can be calculated based on:
98+
* - current amount of data in the buffer
99+
* - data reciever's consumption rate and period
100+
* - data source production rate and period
101+
* - data reciever's module's LST - latest start time
102+
*
103+
* so LFT in high level is sum of:
104+
* - Latest start time (LST) of the data consumer
105+
* LST is defined later
106+
*
107+
* - estimated time the consumer will drain the current data from the buffer
108+
* number_of_ms_in_buffer / consumer_period
109+
* i.e. if there's 5ms of data in the buffer and period of the consumer is
110+
* 2ms, the calculated time is 4ms
111+
*
112+
* - correction for multiple source cycle
113+
* in case the producer period < consumer period the LFT time needs to be corrected,
114+
* as the producer must process more than once to provide enough data.
115+
* The correction will be calculated as:
116+
* producer_LPT * required_number_of_cycles
117+
* where LPT is longest processing time, explained later
118+
*
119+
* correction = producer_LPT *
120+
* ((consumer_period - number_of_ms_in_buffer) / producer_period)
121+
* if correction is < 0, it should be set to zero
122+
* note that in case producer_period >= consumer_period correction is always 0
123+
*
124+
* finally:
125+
* LFT = LST(consumer) + estimatet_drain_time - correction
126+
*
127+
*
128+
* def: DP module's DEADLINE
129+
* a DEADLINE is the latest moment >>a module<< must finish processing to feed all target
130+
* buffers before their LFTs
131+
* Calculation is simple
132+
* - module's deadline is the nearest LFT of all target buffers
133+
*
134+
* def: DP module's Longest Processing Time (LPT)
135+
* LPT is the longest time the module may process a portion of data, assuming it is scheduled
136+
* 100% of CPU time
137+
* >> LPT cannot be measured in runtime << as processing may change from cycle to cycle, etc.
138+
* It can, however, be estimated based on:
139+
* - declared (by a module vendor) number of CPU cycles required for processing.
140+
* This declaration should be done separately for all combination of input/output data
141+
* formats, platform, CPU type, using of HiFi etc.
142+
* - If declaration is not available, we can take "a period" as an approximation of longest
143+
* possible processing time. "A period" is a value calculated using IBS and data consumption
144+
* rate of a module. A module cannot possibly processing longer than its period, because it
145+
* would never provide data in time (if LPT = period that means a module required 100% of
146+
* CPU for processing, so it is really the worst possible case)
147+
*
148+
* Example: if a data rate is 48samples/msec and OBS = 480samples, the "worst case" period
149+
* should be calculated 10ms
150+
*
151+
* NOTE: in case of sampling freq like 44.1 a round up should taken (like if freq was 45000)
152+
*
153+
* The approximation above, however a correct, is assuming that a module is a heavy one and
154+
* it requires 100% of CPU time. Using it may lead to unnecessary buffering, see
155+
* "delayed start" section below.
156+
*
157+
* def: DP module's latest start time (LST)
158+
* LST is the latest moment >>a module<< must begin processing in order to finish within
159+
* its deadline.
160+
* Calculation: deadline - LPT
161+
*
162+
*
163+
* >>>> Based on an above, it is clear that we do need to calculate first a deadline of the
164+
* very latest module in a chain, than go back and calculate LFTs and deadline
165+
* of each module separately <<<<
166+
*
167+
* Fortunate is that the last module of a pipeline is almost always an LL module (usually DAI).
168+
* TODO: how to proceed in If there's no LL at the end of pipeline (i.e. in case when the last
169+
* module is not producing samples - like speech recognition??
170+
*
171+
* note that in case if LL1 -> DP1 -> DP3 -> LL2 -> DP3 -> DP4 -> LL2
172+
* there are 2 separate deadline calculation chains: DP4 than DP3, and (independent) DP2 than DP1
173+
*
174+
* also note that deadlines and other parameters may change, so re-calculation of all parameters
175+
* should occur reasonable frequently and include all DP modules, regardless of a core it is
176+
* run on
177+
178+
* for LL module deadline always is "now", so it is very easy to calculate LFTs for
179+
* its input buffer(s)
180+
* note: in case of data rates like 44.1, which cannot be divided to 1ms, a round up to 45
181+
* should be used
182+
*
183+
* - LL module always start in 1ms periods
184+
* - LL module always consume constant number of bytes in a cycle (with an exception for
185+
* frequencies like 44.1, a round up 45KHz should be taken for calculations)
186+
* so LFT = number of data chunks in buffer * 1ms
187+
*
188+
*
189+
* NOTE!!! "NOW" in all of the calculations is "last start of LL scheduler". It makes all
190+
* calculations simpler, as in the examples below (calculating CPU cycles would require taking
191+
* extra care for 32bit overflows or use slow 64bit operations). Also all modules have the same
192+
* timestamp as "NOW", regardless of moment in the cycle the deadlines are calculated
193+
*
194+
* EXAMPLE1 (data source period is longer or equal to data source)
195+
* let's take a pipeline:
196+
* assuming
197+
* - the pipeline is in stable state (processing for a while, not in startup)
198+
* - no DP is currently processing
199+
* - whole CPU is dedicated to DP, like if LL is on core 0 and DPs on core 1
200+
*
201+
* LL1 ->(buf1, 100ms data) -> DP1 -> (buf2, 0ms data) -> DP2 -> (buf3, 18ms data) -> LL2
202+
* period 100ms period: 20ms
203+
* LPT: 5ms LPT: 10ms
204+
*
205+
* 1) 0ms time:
206+
* - DP1 is ready for processing
207+
* - DP2 is not ready for processing
208+
* calculate deadlines:
209+
* - buf3 LFT = 18ms ==> DP2 deadline = 18ms
210+
* - DP2 LST = DP2 deadline - DP2 LPT = 8ms
211+
* - buf2 LFT = 8ms ==> DP1 deadline = 8ms
212+
*
213+
* DP1 will be scheduled
214+
*
215+
* 2) 5ms time, DP1 has finished processing
216+
*
217+
* LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 100ms data) -> DP2 -> (buf3, 13ms data) -> LL2
218+
* period 100ms period: 20ms
219+
* LPT: 5ms LPT: 10ms
220+
*
221+
* - DP1 is not ready for processing
222+
* - DP2 is ready for processing
223+
* calculate deadlines:
224+
* - buf3 LFT = 13ms ==> DP2 deadline = 13ms
225+
* - DP2 LST = 3ms
226+
* - buf2 LFT = 5*20ms + 3 = 103ms ==> DP1 deadline = 103ms
227+
*
228+
* DP2 will be scheduled
229+
*
230+
*
231+
*
232+
*
233+
* EXAMPLE2 (data source period is shorter than data receiver)
234+
*
235+
* LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 18ms data) -> LL2
236+
* period 5ms period: 20ms
237+
* LPT: 2ms LPT: 10ms
238+
*
239+
* 1) 0ms time
240+
* - DP1 is ready for processing
241+
* - DP2 is ready for processing
242+
*
243+
* calculate deadlines:
244+
* - buf3 LFT = 18ms ==> DP2 deadline = 18ms
245+
* - DP2 LST = 8ms
246+
* - buf2 LFT = 5 + 8 = 13ms. correction for multiple source cycle is negative => 0
247+
* ==> DP1 deadline = 13ms
248+
*
249+
* DP1 gets CPU for 2 ms
250+
*
251+
* 2) 2ms time
252+
* LL1 ->(buf1, 2ms data) -> DP1 -> (buf2, 20ms data) -> DP2 -> (buf3, 16ms data) -> LL2
253+
* period 5ms period: 20ms
254+
* LPT: 2ms LPT: 10ms
255+
*
256+
* - DP1 is not ready for processing
257+
* - DP2 is ready for processing
258+
*
259+
* calculate deadlines:
260+
* - buf3 LFT = 16ms ==> DP2 deadline = 16ms
261+
* - DP2 LST = 6ms
262+
* - buf2 LFT = 20ms (1 period) + 6ms = 26ms
263+
* ==> DP1 deadline = 26ms
264+
*
265+
*
266+
* 3) 12ms time
267+
* LL1 ->(buf1, 12ms data) -> DP1 -> (buf2, 0ms data) -> DP2 -> (buf3, 24ms data) -> LL2
268+
* period 5ms period: 20ms
269+
* LPT: 2ms LPT: 10ms
270+
*
271+
* - DP1 is ready for processing
272+
* - DP2 is not ready for processing
273+
*
274+
* calculate deadlines:
275+
* - buf3 LFT = 24ms ==> DP2 deadline = 24ms
276+
* - DP2 LST = 14ms
277+
* - buf2 LFT = 14ms - 4*2 (4 periods * LPT) = 6ms
278+
* ==> DP1 deadline = 6ms
279+
*
280+
* DP1 gets CPU for 2ms
281+
*
282+
* 4) 14ms time
283+
* LL1 ->(buf1, 9ms data) -> DP1 -> (buf2, 5ms data) -> DP2 -> (buf3, 22ms data) -> LL2
284+
* period 5ms period: 20ms
285+
* LPT: 2ms LPT: 10ms
286+
*
287+
* - DP1 is ready for processing
288+
* - DP2 is not ready for processing
289+
*
290+
* calculate deadlines:
291+
* - buf3 LFT = 22ms ==> DP2 deadline = 22ms
292+
* - DP2 LST = 12ms
293+
* - buf2 LFT = 12ms - correction for multiple source cycle
294+
* correction for multiple source cycle = 3*2 (3 periods * LPT) = 6ms
295+
* - DP1 deadline = 12ms - 6ms = 6ms
296+
*
297+
* DP1 gets CPU for 2ms
298+
*
299+
* 5) 16ms time
300+
* LL1 ->(buf1, 6ms data) -> DP1 -> (buf2, 10ms data) -> DP2 -> (buf3, 20ms data) -> LL2
301+
* period 5ms period: 20ms
302+
* LPT: 2ms LPT: 10ms
303+
*
304+
* - DP1 is ready for processing
305+
* - DP2 is not ready for processing
306+
*
307+
* calculate deadlines:
308+
* - buf3 LFT = 20ms ==> DP2 deadline = 20ms
309+
* - DP2 LST = 10ms
310+
* - buf2 LFT = 10ms - 2*2 (2 periods * LPT) = 6ms
311+
* ==> DP1 deadline = 6ms
312+
*
313+
* DP1 gets CPU for 2ms
314+
*
315+
* 6) 18ms time
316+
* LL1 ->(buf1, 3ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 18ms data) -> LL2
317+
* period 5ms period: 20ms
318+
* LPT: 2ms LPT: 10ms
319+
*
320+
* - DP1 is not ready for processing
321+
* - DP2 is not ready for processing
322+
*
323+
* calculate deadlines - however pointless at when no DP is ready:
324+
* - buf3 LFT = 18ms ==> DP2 deadline = 18ms
325+
* - DP2 LST = 8ms
326+
* - buf2 LFT = 8ms - 1*2 (1 periods * LPT) = 6ms
327+
* ==> DP1 deadline = 6ms
328+
*
329+
* no DP is processing for 2 ms
330+
*
331+
* 7) 20ms time
332+
* LL1 ->(buf1, 5ms data) -> DP1 -> (buf2, 15ms data) -> DP2 -> (buf3, 16ms data) -> LL2
333+
* period 5ms period: 20ms
334+
* LPT: 2ms LPT: 10ms
335+
*
336+
* - DP1 is ready for processing
337+
* - DP2 is not ready for processing
338+
*
339+
* calculate deadlines -
340+
* - buf3 LFT = 16ms ==> DP2 deadline = 16ms
341+
* - DP2 LST = 6ms
342+
* - buf2 LFT = 6ms - 1*2 (1 periods * LPT) = 4ms
343+
* ==> DP1 deadline = 4ms
344+
*
345+
* DP1 gets CPU for 2ms
346+
*
347+
* 8) 22ms time
348+
* LL1 ->(buf1, 2ms data) -> DP1 -> (buf2, 20ms data) -> DP2 -> (buf3, 14ms data) -> LL2
349+
* period 5ms period: 20ms
350+
* LPT: 2ms LPT: 10ms
351+
*
352+
* - DP1 is ready for processing
353+
* - DP2 is not ready for processing
354+
*
355+
* calculate deadlines -
356+
* - buf3 LFT = 14ms ==> DP2 deadline = 14ms
357+
* - DP2 LST = 4ms
358+
* - buf2 LFT = 4ms - 1*2 (1 periods * LPT) = 2ms
359+
* ==> DP1 deadline = 2ms
360+
*
361+
* DP1 gets CPU for 2ms
362+
*
363+
*
364+
* STARTUP
365+
* Special case is "pipeline startup". When a pipeline is starting, calculations make no sense,
366+
* as all the modules are already late and deadlines are in the past
367+
* To make startup possible DELAYED START mechanism needs to be introduced.
368+
*
369+
* Delayed start means that:
370+
* - when a DP module becomes ready for a first time, its deadline set to NOW
371+
* - even if DP module provides data early, the data will be hold in the buffer
372+
* till first LPT passes since DP become ready for the first time
373+
*
374+
* The purpose is that a DP may finish processing quicker than its longest processing time. It may
375+
* be caused by many events, usually by lower CPU load during pipeline startup. If next module
376+
* starts processing immediately in such situation, it may go into underrun when DP processing take
377+
* longer in the future. Delaying to declared/estimated LongestProcessingTime (LPT) prevents this,
378+
* as long as the processing cycles declaration is accurate.
379+
*
380+
* Delayed start makes EDF scheduling possible and ensures that even when CPU load close to 100%
381+
* every module have enough processing time to finish within its deadline.
382+
*
383+
*
384+
* DP SHEDULER
385+
* A list of all DP tasks, regardless on core the task is on, is to be iterated every time
386+
* the situation of DP readiness or deadline timing may change, that include
387+
* - finish of processing of LL pipeline (on any core)
388+
* - finish of processing of any DP module (on any core)
389+
* TODO
390+
*
391+
* during the iteration, the following will be checked:
392+
* - Readiness of each DP module
393+
* as mentioned before, module "is ready" when declared readiness by itself an API call
394+
* or when it has at least IBS of data on each input and at least OBS free space on each out
395+
*
396+
* - deadline calculation of each DP module
397+
* LFTs and Deadlines are not constant, they may change when a module consume/produce
398+
* a portion of data. Therefore all LFTs and Deadlines must be re-calculated
399+
*
400+
* EXAMPLE
401+
*
402+
* DP1 (20ms period) --(BUF1 10ms data)--> DP2 (10ms period) --(BUF2 3ms data)-->LL (1ms period)
403+
* 11ms LPT 1.5ms LPT
404+
*
405+
* current time: 0.1ms before LL cycle
406+
*
407+
* LFT of BUF2: buffer contains 3ms of data, data consumption period is 1ms, so
408+
* LPT(BUF2) = NOW+3.1ms
409+
*
410+
* deadline(DP2) = LFT(BUF2) - LPT(DP2) = NOW + 1.6ms
68411
*
69412
* This function checks if the queued DP tasks are ready to processing (meaning
70413
* the module run by the task has enough data at all sources and enough free space
@@ -73,10 +416,7 @@ static enum task_state scheduler_dp_ll_tick_dummy(void *data)
73416
* if the task becomes ready, a deadline is set allowing Zephyr to schedule threads
74417
* in right order
75418
*
76-
* TODO: currently there's a limitation - DP module must be surrounded by LL modules.
77-
* it simplifies algorithm - there's no need to browse through DP chains calculating
78-
* deadlines for each module in function of all modules execution status.
79-
* Now is simple - modules deadline is its start + tick time.
419+
* EDF scheduling example
80420
*
81421
* example:
82422
* Lets assume we do have a pipeline:

0 commit comments

Comments
 (0)