Recently, we had the opportunity to do some clean up work on a legacy web app.  At the top of our list was to refactor PDF report rendering to the browser.  The typical application use case was as follows:

  1. User enters data.  
  2. User submits data. 
  3. User requests PDF report.  
  4. Application queues report.  
  5. Application processes report.  
  6. Application stores the report on the fileshare.
  7. Application notifies user that report is ready.  
  8. User requests report. 

There’s nothing too tricky here, yet we still managed to really botch it up since there were no less than three security issues associated with the process.  First, we saved the PDF using an auto-incremented identifier for a filename.  For example, the 123rd file would be named 123.pdf.  Second, the PDFs were requested directly.  In the case of a request for the 123rd report, https://www.site.com/reports/123.pdf would be displayed in the address bar.  Finally, we didn’t have any authorization in place so any user could access any report if they were nosy enough and smart enough (and you didn’t have to be too smart) to crack the code. 

We remedied the issue by first encrypting the querystring parameters.  ID 123, in the example above, was being passed around all over the place and we put an end to it.  We also put a layer of abstraction between the PDF request and the PDF display.  This included a single ASPX page and a FileHandler class.  The ASPX validated that the requestor had rights to view the report and then used the FileHandler class to check for the PDF’s existence and return a binary stream of the URL.  This binary stream, in turn, was written out by the ASPX page for the user.

Here’s a block of the code from the ASPX page…

   1:  bool fileFound = FileHandler.FileExists(url);
   2:   
   3:  if (fileFound)
   4:  {
   5:      Response.Clear();
   6:      Response.ContentType = "application/pdf";
   7:      Response.AddHeader("Content-Disposition",
"inline;filename=report.pdf");
   8:      Response.BinaryWrite(
FileHandler.GetBinaryFileStream(url));
   9:  }

 

As for the FileHandler class, this code worked great, but still have a look at the bold green comment in the FileExists() function above.  This one threw us for a loop when testing.  As you can see, the function leverages the HttpWebRequest and HttpWebResponse objects to evaluate if a page URL exists.  Here’s the gotcha.  If 123.pdf didn’t exist, for example, and IIS gracefully handled this by redirecting to a custom 404 error page, the webrequest would be created for the existing 404 error page and it would provide a response.  In this case, the FileExists() would return true even though 123.pdf doesn’t exist!  Basically, custom error handling of PDF files had to be turned off in order for the FileExists() function to operate correctly.

Here’s the majority of the FileHandler class.  And, man, were we lucky to get a second change to get it in place.

   1:  /// <summary>
   2:  /// The class checks for the existence of files
   3:  /// and streams them back to the caller
   4:  /// </summary>
   5:  public class FileHandler
   6:  {
   7:      /// <summary>
   8:      /// Validates the existence of a file/page by URL
   9:      /// </summary>
  10:      /// <param name="url">File or page by URL</param>
  11:      /// <returns>True if exists else false</returns>
  12:      public static bool FileExists(string url)
  13:      {
  14:          bool pageExists = true;
  15:   
  16:          try
  17:          {
  18:              // Beware - if the site gracefully handles 
the missing page through a redirection of a generic 404 error 
page, the page is still considered to be found.
  20:   
  21:              // Request the page or PDF
  22:              System.Net.HttpWebRequest myHttpWebRequest
=(System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);
  23:   
  24:              // Get the associated response for the 
above request.
  25:              System.Net.HttpWebResponse myHttpWebResponse
= (System.Net.HttpWebResponse)myHttpWebRequest.GetResponse();
  26:   
  27:              if (myHttpWebResponse.StatusCode !=
System.Net.HttpStatusCode.OK)
  28:              {
  29:                  pageExists = false;
  30:              }
  31:   
  32:              myHttpWebResponse.Close();
  33:          }
  34:          catch (System.Net.WebException e)
  35:          {
  36:              if (e.Status ==
System.Net.WebExceptionStatus.ProtocolError)
  37:              {
  39:                  pageExists = false;
  40:              }
  41:          }
  42:  
  43:          return pageExists;
  44:      }
  47:   
  48:      /// <summary>
  49:      /// Streams binary representation of the web 
page / file
  50:      /// </summary>
  51:      /// <param name="url">File or page by URL</param>
  52:      /// <returns>Page binary stream</returns>
  53:      public static byte[] GetBinaryFileStream(
string url)
  54:      {
  55:          //string result = string.Empty;
  56:          byte[] data = null;
  57:   
  58:          try
  59:          {
  60:              if (FileExists(url))
  61:              {
  62:                  // Request the page or PDF
  63:                  System.Net.HttpWebRequest
myHttpWebRequest = (System.Net.HttpWebRequest)
System.Net.WebRequest.Create(url);
  64:   
  65:                  // Get the associated response for 
the above request.
  66:                  System.Net.HttpWebResponse
myHttpWebResponse = (System.Net.HttpWebResponse)
myHttpWebRequest.GetResponse();
  67:   
  68:                  // Fill the binary reader
  69:                  System.IO.BinaryReader oStream = new
System.IO.BinaryReader(myHttpWebResponse.GetResponseStream(),
System.Text.Encoding.ASCII);
  70:   
  71:                  // Read the byte array. hack below?
  72:                  data = oStream.ReadBytes(100000000);
  73:  
  74:                  oStream.Close();
  75:   
  76:                  myHttpWebResponse.Close();
  77:              }
  78:          }
  79:          catch
  80:          {
  81:              throw;
  82:          }
  83:   
  84:          return data;
  85:      }
  86:  }
 

5 Comments to “Secure PDF Display – A Second Chance”

  1. Jon Galloway says:

    Minor “by the way” point, but Response.TransmitFile is much more performant than Response.BinaryWrite. I realize that you’re also writing about security concerns which may disqualify TransmitFile, but I’m saying it anyways.

  2. bgriswold says:

    Good point. Comments are always welcome. Thank you, Jon. My next post – Optimizing PDF Display – A Third Chance. :)

  3. my sister and I want to start a blog page together?

  4. Can I use Joomla to update a purely HTML coded website (I mean a website that is not designed using Joomla) ?

  5. Substantially, this publish is really the sweetest on this notable theme. I harmonise together with your conclusions and will thirstily search ahead for your incoming updates. Stating thanks will not just be sufficient, for the phenomenal clarity in your writing. I will directly grab your rss feed to stay knowledgeable of any updates. Admirable perform and much success inside your enterprise dealings! Please excuse my poor English as it’s not my first tongue.

Leave a Reply

You can wrap your code with [ruby][/ruby] or [python][/python] blocks for syntax highlighting and you can use these traditional tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>