[FLINK-39264][docs] Add docs for application management by eemario · Pull Request #27818 · apache/flink

eemario · 2026-03-24T09:51:31Z

What is the purpose of the change

This pull request adds docs for application management.

Brief change log

Add a new page for application
Update outdated descriptions to reflect current functionality

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

flinkbot · 2026-03-24T10:00:21Z

CI report:

c985561 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

davidradl · 2026-03-25T16:54:22Z

+  Even after all applications are finished, the cluster (and the JobManager) will
  keep running until the session is manually stopped. The lifetime of a Flink
-  Session Cluster is therefore not bound to the lifetime of any Flink Job.
+  Session Cluster is therefore not bound to the lifetime of any Flink Application.


of any Flink Application. -> of any Flink Application or job.

I've updated the documentation accordingly.

davidradl · 2026-03-25T16:54:59Z

 * **Cluster Lifecycle**: in a Flink Session Cluster, the client connects to a
-  pre-existing, long-running cluster that can accept multiple job submissions.
-  Even after all jobs are finished, the cluster (and the JobManager) will
+  pre-existing, long-running cluster that can accept multiple application submissions.


I suggest a hyper link to the definition of application would be useful, or a quick summary.
I wonder if it is still mentioning jobs as well as applications. Or is every job now in an applicaiton?

Regarding the definition of "application," it is actually mentioned at the very beginning of this section (Flink Application Execution) to set the context.
Your understanding is correct: a job now is always submitted by an application and is considered part of it. My take is that the key distinction to emphasize here is that a Session Cluster accepts multiple application submissions, which is a key difference from Application Mode.

davidradl · 2026-03-25T16:56:23Z


+#### ApplicationResultStore
+
+The ApplicationResultStore is a Flink component that persists the results of terminated


It would be worth doing into more detail as to what we mean by Results is this the last checkpoint / savepoint?

The Application Result primarily contains high-level, final information about the application's execution. This includes its application ID, its final status (e.g., FINISHED, FAILED, CANCELED), and its name. It doesn't refer to the last checkpoint or savepoint, but rather the overall outcome.
Essentially, it's the application-level equivalent of a JobResult. I've added a brief note to clarify this. Thanks!

davidradl · 2026-03-25T16:57:56Z

 **JobManager**

-The archiving of completed jobs happens on the JobManager, which uploads the archived job information to a file system directory. You can configure the directory to archive completed jobs in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}) by setting a directory via `jobmanager.archive.fs.dir`.
+The archiving of completed jobs and applications happens on the JobManager, which uploads the archived job and application information to a file system directory. You can configure the directory to archive completed jobs and applications in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}) by setting a directory via `jobmanager.archive.fs.dir`.


is this different from the application results store - as these are archives - it would be worth contrasting the two if they are different or referring to them in the same way if they are the same.

The History Server and the Application Result Store are indeed completely different. Here’s the distinction:

The Application Result Store is an internal mechanism. It stores only minimal information (e.g., application ID and final status) to mark an application as "terminated," preventing it from being incorrectly re-submitted or restarted during a failover.

The History Server is a user-facing archival tool. It saves detailed information from completed applications/jobs by caching their REST API responses (like /applications/:appid, etc.). This allows users to query and inspect application/job details long after the cluster has shut down.

I have added a brief comparison of the two to the glossary section. Thanks for the feedback!

davidradl · 2026-03-25T16:59:13Z

+
+  - `/applications/overview`
+  - `/applications/<applicationid>`
+  - `/applications/<applicationid>/jobmanager/config`


can you see the jobs that were under an application? This would seem to be the most useful thing you would want to see.

You can indeed see all the jobs that were part of a completed application. The History Server's REST API is designed to mirror the standard JobManager REST API. This means that when you request an application's overview or details, the response naturally includes information about the jobs within it.

To make this clear, I've added a note to the documentation explaining this behavior and have also included a link to the JobManager REST API page for reference on the response format. Thanks!

davidradl · 2026-03-25T16:59:46Z


 JobManager High Availability (HA) hardens a Flink cluster against JobManager failures.
-This feature ensures that a Flink cluster will always continue executing your submitted jobs.
+This feature ensures that a Flink cluster will always re-execute your submitted applications that were running at the time of a failure.


what about checkpoints?

My understanding is that you're asking about the checkpoints of jobs within a re-executed application. When an application is re-executed in HA mode, what happens to its running jobs is determined by the application's own logic in the main method:

Resumption: If the logic in the main method re-submits the job, it will automatically resume from its latest checkpoint.

Abandonment: If the application's logic does not re-submit the job, it is considered abandoned. The job will be moved to a FAILED state, and its resources, including all checkpoints, are properly cleaned up.

I've added some explanation to the documentation to make the behavior clear.

davidradl · 2026-03-25T17:01:34Z

-The HA data will be kept until the respective job either succeeds, is cancelled or fails terminally.
-Once this happens, all the HA data, including the metadata stored in the HA services, will be deleted.  
+In order to recover submitted applications, Flink persists metadata for the applications.
+The HA data will be kept until the respective application either succeeds, is cancelled or fails terminally.


I am curious what fails terminally might mean - some examples of types of this would be useful.

What I intended to express is the concept of an application reaching any terminal state. To make this clear, I've updated the documentation to explicitly list the three terminal states. This should be more precise. Thanks!

RocMarshal

Hi, @eemario Thank you for the contribution.

LGTM on the whole.

As described in [1], It would be great if links (anchor points) could be added to each heading according to the documentation requirements.

[1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications

To keep the current changes simple, it seems preferable to add anchor links only to the headings introduced in this PR. This helps maintain a minimal and focused scope of changes—adhering to the anchor-linking principle without expanding the modified content beyond what is described in the JIRA title.

WDYTA ?

eemario · 2026-04-20T06:10:09Z

Hi @RocMarshal ,
Thanks for the suggestion! Agreed — I've added anchor links to the headings introduced in this PR, following the translation specifications.

RocMarshal

LGTM +1.
Merging...

RocMarshal · 2026-04-20T08:35:47Z

Hi, @eemario Could you help make a BP-PR for release-2.3 ? Thanks

eemario · 2026-04-20T11:44:45Z

Hi @RocMarshal ,
Thanks for the review and merge! The BP-PR is ready #27977.

eemario force-pushed the FLIP560-9 branch from 85971fb to dc9aeca Compare March 25, 2026 03:28

eemario changed the title ~~[FLINK-38972][docs] Add docs for application management~~ [FLINK-39264][docs] Add docs for application management Mar 25, 2026

eemario force-pushed the FLIP560-9 branch from dc9aeca to bf8f15a Compare March 25, 2026 03:29

eemario marked this pull request as ready for review March 25, 2026 03:51

davidradl reviewed Mar 25, 2026

View reviewed changes

eemario force-pushed the FLIP560-9 branch from bf8f15a to ab6cbe2 Compare April 8, 2026 07:01

RocMarshal reviewed Apr 20, 2026

View reviewed changes

eemario added 3 commits April 20, 2026 13:44

[FLINK-39264][docs] Add docs for application management

0ad7036

refine docs

8d54cf5

fix anchor points

c985561

eemario force-pushed the FLIP560-9 branch from ab6cbe2 to c985561 Compare April 20, 2026 06:10

RocMarshal self-assigned this Apr 20, 2026

RocMarshal approved these changes Apr 20, 2026

View reviewed changes

RocMarshal merged commit d49eb62 into apache:master Apr 20, 2026


		#### ApplicationResultStore

		The ApplicationResultStore is a Flink component that persists the results of terminated

Conversation

eemario commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RocMarshal left a comment

Choose a reason for hiding this comment

Uh oh!

eemario commented Apr 20, 2026

Uh oh!

RocMarshal left a comment

Choose a reason for hiding this comment

Uh oh!

RocMarshal commented Apr 20, 2026

Uh oh!

eemario commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eemario commented Mar 24, 2026 •

edited

Loading

flinkbot commented Mar 24, 2026 •

edited

Loading