f# , Why embed async in async?

Why embed async in async?


I read the following code from the book Expert f#,

  1. Why the function collectLinks embeds let! html = async { .... } in the outer async block? How about just flat it by removing the inner async?

  2. Same question for the function waitForUrl in urlCollector which has a do! Async.StartChild (async {....}) |> Async.Ignore in an outer async block. How about flat it?

  3. How is the implementation comparing with the one implemented with block queue? https://msdn.microsoft.com/en-us/library/vstudio/hh297096(v=vs.100).aspx Creating a block queue with 5, and en-queue the link to producer.


open System.Collections.Generic
open System.Net
open System.IO
open System.Threading
open System.Text.RegularExpressions

let limit = 50
let linkPat = "href=\s*\"[^\"h]*(http://[^&\"]*)\""
let getLinks (txt:string) =
    [ for m in Regex.Matches(txt,linkPat)  -> m.Groups.Item(1).Value ]

// A type that helps limit the number of active web requests
type RequestGate(n:int) =
    let semaphore = new Semaphore(initialCount=n, maximumCount=n)
    member x.AsyncAcquire(?timeout) =
        async { let! ok = Async.AwaitWaitHandle(semaphore,
                if ok then
                     { new System.IDisposable with
                         member x.Dispose() =
                             semaphore.Release() |> ignore }
                   return! failwith "couldn't acquire a semaphore" }

// Gate the number of active web requests
let webRequestGate = RequestGate(5)

// Fetch the URL, and post the results to the urlCollector.
let collectLinks (url:string) =
    async { // An Async web request with a global gate
            let! html =
                async { // Acquire an entry in the webRequestGate. Release
                        // it when 'holder' goes out of scope
                        use! holder = webRequestGate.AsyncAcquire()

                        let req = WebRequest.Create(url,Timeout=5)

                        // Wait for the WebResponse
                        use! response = req.AsyncGetResponse()

                        // Get the response stream
                        use reader = new StreamReader(response.GetResponseStream())

                        // Read the response stream (note: a synchronous read)
                        return reader.ReadToEnd()  }

            // Compute the links, synchronously
            let links = getLinks html

            // Report, synchronously
            do printfn "finished reading %s, got %d links" url (List.length links)

            // We're done
            return links }

/// 'urlCollector' is a single agent that receives URLs as messages. It creates new
/// asynchronous tasks that post messages back to this object.
let urlCollector =
    MailboxProcessor.Start(fun self ->

        // This is the main state of the urlCollector
        let rec waitForUrl (visited : Set<string>) =

           async { // Check the limit
                   if visited.Count < limit then

                       // Wait for a URL...
                       let! url = self.Receive()
                       if not (visited.Contains(url)) then
                           // Start off a new task for the new url. Each collects
                           // links and posts them back to the urlCollector.
                           do! Async.StartChild
                                   (async { let! links = collectLinks url
                                            for link in links do
                                               self.Post link }) |> Async.Ignore

                       // Recurse into the waiting state
                       return! waitForUrl(visited.Add(url)) }

        // This is the initial state.


I can think of one reason why async code would call another async block, which is that it lets you dispose of resources earlier - when the nested block completes. To demonstrate this, here is a little helper that prints a message when Dispose is called:

let printOnDispose text = 
  { new System.IDisposable with
      member x.Dispose() = printfn "%s" text }

The following uses nested async to do something in a nested block and then cleanup the local resources used in the nested block. Then it sleeps some more and cleans up resources used in the outer block:

async { 
  use bye = printOnDispose "bye from outer block"
  let! r = async {
    use bye = printOnDispose "bye from nested block"
    do! Async.Sleep(1000)
    return 1 }
  do! Async.Sleep(1000) }
|> Async.Start

Here, the "nested block" resources are disposed of after 1 second and the outer block resources are disposed of after 2 seconds.

There are other cases where nesting async is useful (like returning from an asynchronous block containing try .. with), but I don't think that applies here.


