Skip to content

Latest commit

 

History

History
246 lines (176 loc) · 5.7 KB

File metadata and controls

246 lines (176 loc) · 5.7 KB

WebSocket Cleanup Bug Fix

Date: October 5, 2025
Issue: Server panic when deleting subscriptions with active WebSocket connections
Severity: High (causes server crash)
Status: ✅ FIXED


Problem Description

Symptom

panic: close of closed channel

goroutine 44 [running]:
github.com/frstrtr/mongotron/internal/subscription.(*EventRouter).UnregisterClient(...)
    /home/user0/Github/mongotron/internal/subscription/router.go:255 +0x26e
github.com/frstrtr/mongotron/internal/api/websocket.(*Hub).unregisterClient(...)

Root Cause

The WebSocket channel (send / SendChan) was being closed in two different places:

  1. Hub.unregisterClient() in internal/api/websocket/hub.go:132

    close(client.send)
  2. EventRouter.UnregisterClient() in internal/subscription/router.go:255

    close(client.SendChan)

Both references pointed to the same channel, causing a "close of closed channel" panic when:

  • A WebSocket client disconnects
  • A subscription is deleted with active WebSocket connections
  • Server shutdown with active connections

Solution

Changes Made

1. Added Channel State Tracking (router.go)

File: internal/subscription/router.go

Added a closed flag to WebSocketClient struct to track channel state:

type WebSocketClient struct {
	ID       string
	SendChan chan []byte
	mu       sync.RWMutex
	closed   bool  // Track if channel has been closed
}

2. Safe Channel Closure in EventRouter (router.go)

Updated UnregisterClient() to only close the channel if not already closed:

func (r *EventRouter) UnregisterClient(subscriptionID string, clientID string) {
	r.mu.Lock()
	defer r.mu.Unlock()

	clients := r.wsClients[subscriptionID]
	for i, client := range clients {
		if client.ID == clientID {
			r.wsClients[subscriptionID] = append(clients[:i], clients[i+1:]...)
			
			// Safely close the channel only if not already closed
			client.mu.Lock()
			if !client.closed {
				close(client.SendChan)
				client.closed = true
			}
			client.mu.Unlock()

			// ... rest of cleanup
			break
		}
	}
}

3. Removed Duplicate Close in Hub (hub.go)

File: internal/api/websocket/hub.go

Removed channel close from unregisterClient() since EventRouter handles it:

func (h *Hub) unregisterClient(client *Client) {
	h.mu.Lock()
	defer h.mu.Unlock()

	for subscriptionID, clients := range h.clients {
		if _, ok := clients[client]; ok {
			delete(clients, client)
			
			// Don't close the channel here - let the EventRouter handle it
			// This prevents "close of closed channel" panic

			// Unregister from event router (which will safely close the channel)
			h.eventRouter.UnregisterClient(subscriptionID, client.id)

			// ... rest of cleanup
			break
		}
	}
}

Testing

Before Fix

Total Tests:  15
✅ Passed:    14
❌ Failed:    1

Failed test: Delete Subscription (caused server panic)

After Fix

Total Tests:  15
✅ Passed:    15
❌ Failed:    0

🎉 All tests passed!

Test Coverage

The fix was verified with:

  1. ✅ WebSocket connection and disconnection
  2. ✅ Subscription deletion with active WebSocket
  3. ✅ Multiple concurrent WebSocket clients
  4. ✅ Server remains stable after operations
  5. ✅ No panic in logs

Server Stability

# Before fix: Server crashed during delete test
[1]+  Exit 2  ./bin/api-server

# After fix: Server continues running
user0  77674  26.3  0.0  3243144  28276  Sl  21:41  0:28  ./bin/api-server
✅ Server still running after delete test!

Technical Details

Ownership Model

The fix establishes clear ownership:

  • EventRouter owns the WebSocketClient and its channel
  • Hub manages the high-level client registration
  • Channel closure is only performed by EventRouter

Thread Safety

  • Uses sync.RWMutex to protect the closed flag
  • Lock is held during channel state check and close operation
  • Prevents race conditions between multiple unregister calls

Cleanup Flow

  1. Client disconnects or subscription deleted
  2. Hub calls EventRouter.UnregisterClient()
  3. EventRouter checks if channel is already closed
  4. If not closed, closes channel and sets flag
  5. Client removed from tracking structures

Impact

Before

  • ❌ Server crashes on subscription deletion
  • ❌ Lost all active connections
  • ❌ Required manual restart
  • ❌ Poor user experience

After

  • ✅ Graceful subscription deletion
  • ✅ Server remains stable
  • ✅ Other connections unaffected
  • ✅ Production-ready reliability

Files Modified

  1. internal/subscription/router.go

    • Added closed field to WebSocketClient
    • Added safe channel closure logic in UnregisterClient()
  2. internal/api/websocket/hub.go

    • Removed duplicate channel close
    • Added explanatory comments

Recommendations

Future Enhancements

  1. Add Unit Tests: Create tests specifically for concurrent unregister scenarios
  2. Metrics: Add metrics for channel close operations
  3. Logging: Add debug-level logging for channel state changes
  4. Documentation: Update WebSocket architecture documentation

Best Practices Applied

  • ✅ Single ownership of resources
  • ✅ Thread-safe state management
  • ✅ Defensive programming (state checks)
  • ✅ Clear comments explaining design decisions

Conclusion

The WebSocket cleanup bug has been successfully fixed. The server now handles:

  • ✅ Normal WebSocket disconnections
  • ✅ Subscription deletions with active connections
  • ✅ Graceful shutdown scenarios
  • ✅ Concurrent client operations

Production Status: Ready for deployment

Test Results: 100% pass rate (15/15 tests)

Stability: No panics or crashes observed