feat(video_composition): support of composing video frames from multiple video segments#129
feat(video_composition): support of composing video frames from multiple video segments#129ekuleshov wants to merge 2 commits into
Conversation
…ple video segments
hm21
left a comment
There was a problem hiding this comment.
Thanks for working on this huge improvement. I left a few comments, but overall it looks good to me and will improve that package a lot.
What I didn’t have time for yet is testing it extensively, but in my very quick tests everything worked perfectly fine. Btw I also created a prerelease pro_video_editor: 2.0.0-beta.1 for it so other users can start testing it as well.
| outputImage = rotatedImage.transformed(by: translation) | ||
| center = CGPoint(x: outputImage.extent.midX, y: outputImage.extent.midY) | ||
| } | ||
| // 7. Apply global effects (if any) |
There was a problem hiding this comment.
Regression: global crop is no longer applied.
The previous implementation contained a block here that consumed cropX/cropY/cropWidth/cropHeight and called outputImage.cropped(to:). It was removed in this PR, but applyCrop() still populates these fields. As a result, any render call that uses cropping only ends up with a modified finalRenderSize (black bars / squashed output) — the image content is never cropped. Please re-introduce a crop step inside startRequest.
There was a problem hiding this comment.
Sorry, I missed that. Could you point me to an example or a test that uses that?
| context: Context, | ||
| useHdr: Boolean, | ||
| private val effect: VideoCompositionTransformation | ||
| ) : BaseGlShaderProgram(useHdr, /* texturePoolCapacity= */ 1) { |
There was a problem hiding this comment.
texturePoolCapacity = 1 disables pipelining for this effect. Was this intentional, or just copied from a sample? A comment would help explain the trade-off.
There was a problem hiding this comment.
Mostly copied from some samples... I was fighting with transparency on overlapping videos, as well as some time-gaps between video overlays, and this is the only approach that I could get to work.
| this.offset, | ||
| this.size, | ||
| this.zIndex, | ||
| this.opacity, |
There was a problem hiding this comment.
Please add asserts for the new fields, e.g. opacity ∈ [0, 1] and size.width/height >= 0. Bad input here surfaces only as cryptic native errors.
Also: toMap() serializes offset/size as nested maps, while toAsyncMap() flattens them into x/y/width/height. That divergence is confusing — consider unifying or commenting why the platform channel uses a different schema.
There was a problem hiding this comment.
I think I copied one of them from other models, effects or images. Can change them both to be the flat (or maps), if I'm not mistaken, I copied that stuff from other models that had similar attributes (either image overlays or effects). These values/structures seems simple enough to keep them flat to reduce some complexity.
| 'scaleX': scaleX, | ||
| 'scaleY': scaleY, | ||
| 'renderWidth': qualityConfig?.resolution?.width, | ||
| 'renderHeight': qualityConfig?.resolution?.height, |
There was a problem hiding this comment.
Behaviour change worth flagging in the CHANGELOG.
Removing the qualityConfig → scaleX/scaleY derivation is cleaner, but users who previously relied on qualityConfig.resolution (without setting explicit scaleX/scaleY) will now get the resolution forwarded as renderWidth/renderHeight instead of an implicit scale. iOS turns that into intendedRenderSize-based scale correction; Android forwards it to VideoSequenceBuilder. The output may differ slightly from previous releases, so please call this out as a (potentially) breaking change.
There was a problem hiding this comment.
Are there example or test that relied on that scaleX/scaleY?
I find these too confusing and hard to deal with, even for the image overlays. It seems much simpler to be explicit - calculate and specify sizes/offsets of individual image overlays or video segments based on video metadata.
In my tests, the implicit scale didn't work for all cases, e.g. when target video frame needs to be bigger.
Though when not specified, the first video segment's resolution is used.
| // Clear the target framebuffer to transparent before drawing the segment. | ||
| // This ensures that segments that don't cover the full canvas don't show garbage. | ||
| GLES20.glClearColor(0f, 0f, 0f, 0f) | ||
| GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT) |
There was a problem hiding this comment.
glClear inside drawFrame is suspicious. If Media3 reuses the same FBO across multiple segment effects in the same frame, this clear will wipe out previously drawn segments and you'd only ever see the last one. Please verify by rendering a stack/grid composition and checking that all segments are visible. Clearing should typically be done once per output frame, not per segment effect.
There was a problem hiding this comment.
I'll check and will poke around glClear. This code was also to make gaps between videos transparent, e.g. when higher zIndex video segment starts later than segments behind it. The Media3 default processing wasn't making it easy to deal with.
The new grid example has several overlapping video segments. Then transparent video on top of them. And then transparent image overlay on top of them all. That seem to be working as expected (you may need to maximize video player to see the transparency).
| y = (clipMap["y"] as? Number)?.toDouble(), | ||
| width = (clipMap["width"] as? Number)?.toDouble(), | ||
| height = (clipMap["height"] as? Number)?.toDouble(), | ||
| zIndex = (clipMap["zIndex"] as? Number)?.toInt(), |
There was a problem hiding this comment.
Android silently ignores zIndex.
The PR description mentions this is out of scope, but right now a user can pass zIndex and it has no effect on Android with no indication. Please at least Log.w(...) when clip.zIndex != null and document the limitation on the Dart VideoSegment.zIndex field ("Currently not supported on Android — image layers always render on top").
There was a problem hiding this comment.
Right. This is probably a left-over from my failed attempts to make zIndex work for image overlays. I don't think this field is propagated from the Dart code and it isn't used in the rendering pipeline.
I can remove it or add some todos to Android implementation.
| var zIndex: Int { | ||
| switch self { | ||
| case .video(_, let clip, _): return clip.zIndex ?? 0 | ||
| case .imageLayer: return Int.max |
There was a problem hiding this comment.
Image-layer z-order is hard-coded to Int.max.
This forces every image layer to render above all video segments, regardless of segment zIndex. Now that segments expose zIndex, image layers should participate in the same ordering (or at least be configurable). Otherwise users cannot put a video segment on top of an image overlay.
There was a problem hiding this comment.
Right. The ios/macos platform code actually has working zIndex support implemented for image layers and this switch basically disables it to make image overlays to always render on top, to match same behavior across platforms.
To enable it back just have to propagate value from Dart and replace .max here with an actual zIndex. Perhaps also add todo or some comment here?
There was a problem hiding this comment.
Image-layer z-order is hard-coded to Int.max in the new RenderableItem enum.
This forces every image layer to render above all video segments, regardless of segment zIndex. Now that segments expose zIndex, image layers should participate in the same ordering (or at least be configurable). Otherwise users cannot put a video segment on top of an image overlay.
There was a problem hiding this comment.
Correct, I mentioned this in PR notes. It is working on Apple platforms, but not on Android, so currently disabled to work the same between platforms.
There was a problem hiding this comment.
Integration tests for the new composition features are missing.
example/integration_test/video_merge_test.dart only covers the previous VideoSegment fields (startTime/endTime/volume). Please add coverage for the new functionality introduced here, ideally in a new file like example/integration_test/video_composition_test.dart. Suggested cases:
-
segmentTime(absolute placement)- Two segments with overlapping
segmentTime→ assert output duration and that both rendered without crash. - Single segment with
segmentTime: Duration(seconds: 5)and no preceding segment → expect a black gap before it.
- Two segments with overlapping
-
offset+size(PIP)- Background segment full-frame + second segment with
offset: Offset(20, 20),size: Size(w/2, h/2)→ sample frame bytes inside vs. outside the PIP rectangle and assert they differ; output resolution must match the background.
- Background segment full-frame + second segment with
-
zIndex- Two full-frame segments with different
zIndex→ assert the higher-zIndex segment is visible by sampling pixels. - On Android, expect the field to be tolerated (no crash); mark with
skip:or a platform guard until Android support lands.
- Two full-frame segments with different
-
opacity- Segment with
opacity: 0.5over a solid-color background → assert sampled pixels lie within the expected blended range.
- Segment with
-
zIndexordering between video segments and image layers — only relevant once the hard-codedInt.maxissue is fixed; can be a follow-up. -
Regression tests for the issues raised in this review (once those fixes land):
- Single-segment render with
cropWidth/cropHeight→ frame bytes match expected crop region. - Single-segment render with
rotateTurns: 1→ resolution swap + top-left pixel sample. - Multi-segment +
rotateTurns: 1combination.
- Single-segment render with
-
Platform guards: skip iOS/macOS-only or Android-only cases where the feature isn't supported yet.
At minimum, (1)–(4) should land in this PR as smoke tests for the new fields; (5) and (6) can be follow-ups.
There was a problem hiding this comment.
README needs to be updated for the new VideoSegment fields.
The VideoSegment API reference (around lines 526–550 of README.md on stable) currently only documents video, startTime, endTime. Please extend it with the new fields introduced here — volume, offset, size, zIndex, opacity, segmentTime — including the platform support matrix (e.g. note that zIndex is currently iOS/macOS-only) and ideally a short PIP / stack example mirroring the new _combinedPip demo.
Also worth a paragraph in the "Render" section about composing multiple overlapping segments, since that's the headline feature of this PR.
Description
This is a first a bit rough cut of the video rendering pipeline update with support for composing video frames from multiple video segments. @hm21 please review.
VideoSegmentmodel is updated with a few new attributesRelated Issue: discussed here
The Android implementation currently does sub-optimal audio re-encoding. I've hit number of issues with Media3 encoders not liking leading or trailing blank/silent gaps and choking on audio encoding even when track has no audio. So, the Android code is currently re-encoding audio if there are non-sequential segments present.
I've had to "out of scope"
zIndexsupport for the image layers. Couldn't get it work in the Media3 API on Android from a few tries and apparently I don't really needed that for my own use cases. So, the image layers are applied at the end and on top of the composed video frames.The current implementation should make it easy to add per-segment transformations similar to what are defined globally (flip, color mask, rotate, etc), but I also left that for the future.
Type of Change