Tuesday, March 4, 2014

YAWS Login Page Tutorial

Introduction

This blog entry is all about creating a YAWS-based login page. This is a lot more challenging of a task than I first thought! This entry should take you from almost 0 to a working login-page stub. First some links up-front:
This is going to be a LONG post because there's a lot to learn and understand about web technologies, the Erlang language, and YAWS data structures. If you are looking for a quick guide just use the YAWS guide- this tutorial is probably better for the beginner.

Prerequisites

This entry assumes you have a working YAWS system in place and are familiar with your system. This entry is using CrunchBang Linux but all of the ideas should work anywhere. The reader will also need access to being able to set environment variables. This just makes our life easier. 

Setup YAWS_INCLUDES 

The very first thing that needs to be done is identifying where a file called yaws_api.hrl is. The reason this file is important is that it's where the "Argument" record is defined. The example in Chapter 7 of the YAWS documentation seems to hint at wanting to change the Argument (more on this later). There are other scenarios where wanting access to the yaws_api.hrl file exist; let's make it easier to get to. 

1) Find yaws_api.hrl on your system. In Windows you can do a fancy search in the explorer for the file. On Linux, be more fancy and amaze your friends:

> find . -name yaws_api.hrl

Obviously do this in a directory far enough down where it can be found. Since I built mine from source it's in:

/usr/local/lib/yaws/include/yaws_api.hrl

I did a search through the contents of the directory and it doesn't appear that any of them use any other include directives, so we don't have to worry about pathing so much.

2) Add an environment variable called YAWS_INCLUDES.

In Windows a shortcut key to get to the place to add system environment variables is to hold down the Windows key and press the Break button (get it, break-Windows?). I give credit to Josh for that one. The dialog that pops up should have an "Advanced system  settings" where you can access the environment variables from.

In Linux, modify your .bashrc file to have this definition:

> vim ~/.bashrc

Add at the bottom:

export YAWS_INCLUDES="/usr/local/lib/yaws/include"

And save. Once back out type:

> source ~/.bashrc

That will reload your bashrc without the headaches of logging out and back in.

3) Check to make sure your environment variable was set.

Open up a Terminal in Linux (or a Command Window in Windows). In Windows type:

> echo %YAWS_INCLUDES%

And in Linux:

> env | grep YAWS_INCLUDES

How Blocking Pages Works (Part 1)

The first question that I had about the YAWS webserver (or any webserver) was: How do I make sure that pages that are sensitive are blocked to other users? New developers have no clue how this mechanism works. It all boils down to the "Arg" request.

When the customer of your website types in a URL:

http://www.yourawesomesite.com

or whatever, the browser puts together a request and sends it over a TCP Socket port 80. The YAWS webserver is listening on port 80 and turns that request into an Arg record which can then be consumed by our Erlang code via the "out" functions (remember, they take an Arg as input). What YAWS can do is it can take this request and conditionally rewrite it before it ever gets processed. This means that if a user is seen as "not logged in" we can redirect them from blocked pages to pages that are safe for general viewing (without a login). I'm going to set up an example similar to what they have in the YAWS documentation but the difference is I'm going to go in reverse.

The first step is to set up a container of allowable pages that the user will want to visit. In the YAWS book the name of the file is "myapp.erl", we are going to set up "lm_app.erl". Note: I am putting all of my Erlang modules inside of the www directory of my YAWS website. This is probably not the best place for these pieces of code! 

%%=======================
%% lm_app.erl
%%
%% @version 0
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg" ].

Please note that I'll keep incrementing the @version tag at the top during each iteration of the code in this document. The first line that does anything is the -include directive. This is needed because eventually we want to modify the incoming request. The incoming http request is stuffed into a data structure defined by the YAWS source code. Conveniently in a step before this we found out where it lives and created the environment variable YAWS_INCLUDES. 

The login_pages() function simply outlines what pages on the system don't require login credentials! In this case, I've made the main page and the title image both accessible without credentials. The next step is to create a main entry point for Argument "rewriting". This function is called "arg_rewrite". The way you specify your own custom arg_rewrite function is to change the module that YAWS looks under. For now lets create a stub for this function (just to prove what it does):

%%=======================
%% lm_app.erl
%%
%% @version 1
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").
-export([arg_rewrite/1]).

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg" ].

arg_rewrite(Arg) ->
    io:fwrite("ARGUMENT REWRITING HIT!~n"),
    Arg.

This function alone isn't going to be enough since we haven't specified to YAWS what module should be used to do argument rewriting. Also please note that before continuing the code in lm_app.erl should be compiled and in beam format!

Open up the yaws.conf file and add a line to the server:

<server yourawesomesite.com>
    port = 80
    arg_rewrite_mod = lm_app
    listen = 0.0.0.0
    docroot = ...
    auth_log = ...
    appmods = ...
</server>

The bold blue line is what you want to add. This is stating to YAWS that we want the module lm_app to supply arg_rewrite. As an experiment once you are done restart your YAWS server (make sure that lm_app.beam is also available!).

> sudo yaws -i

After YAWS starts, navigate to your website from another panel. You should see:

ARGUMENT REWRITING HIT!

In your terminal window. What is happening is that when a customer accesses your website page (index.yaws), it creates an "Argument" that models the http request, then that gets fed to lm_app:arg_rewrite for modification. Whatever Argument is returned from lm_app:arg_rewrite is what is used going forward!

What's in Argument?

I have to interrupt the previous topic about blocking pages because it's a good time to cover what's in Argument. Every member of Argument is documented in Chapter 4 of the YAWS documentation (right now it's on page 14). Here are the contents (as of today):


  • Connection Information( clisock, client_ip_port ): Developers have access to the socket that the client is using for connection and the client's ip address and port. 
  • Header Information (headers)
  • HTTP Request Information (req): The request can further be broken down into three more pieces of information: method, path, version.
Login Information


During a session some information needs to be available: username, password, possible other data. To define this information we can use an hrl file. I'm using similar information as the documentation:

%%=======================
%% lm_session.hrl
%%
%% @version 0


%%=======================

-record( session, { user, passwd, udata=[]}).

Make sure that this lm_session.hrl file is accessible from within the code.


The Login Page

At this point I want to skip to the login page- the actual page that takes the username and password. Inside of index.yaws, :

<html>
    <title> My Awesome Page </title>
        <form action="/login_post.yaws" method="post">
            <p> UserName <input name="uname" type="text">
            <p> Password <input name="passwd" type="password">
            <input type="submit">
        </form>
 </html>


Notice I'm not using the ehtml; I ran into an error where it simply stated that YAWS had an internal error due to formatting in the ehtml attributes. It was a pretty cryptic error- so to simplify we'll use basic HTML.

The important component to the form is the "login_post.yaws" target. This means we need to add a file called login_post.yaws that will handle the log information. Another thing to take notice of is that login_post.yaws is mentioned in the allowable login pages.

%%=======================
%% lm_app.erl
%%
%% @version 2
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").
-export([arg_rewrite/1]).

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg", "/login_post.yaws" ].

arg_rewrite(Arg) ->
    io:fwrite("ARGUMENT REWRITING HIT!~n"),
    Arg.

Now inside of login_post.yaws:


<!--
%%=======================
%% login_post.yaws
%%
%% @version 0
%%=======================
-->
<erl>
-include("lm_session.hrl").
kv(K,L) -> 
    { value, {K, V}} = lists:keysearch(K,1,L),
    V.
out(A) ->
    L = yaws_api:parse_post(A),
    User = kv("uname", L),
    Passwd = kv("passwd", L),
    { html, f("User Name: ~s<br>Password: ~s", [User, Passwd])}.
</erl>

There are some key differences in what is in the YAWS documentation versus what is here. First of all, this is just a jumping off point because this code doesn't do anything other than prints out your username and password on the webpage. Not really useful except to debug and foreshadow the use of cookies and authenticating (next section).

Cookies

We have the login page (index.yaws) and the page that it goes to after logging in (login_post.yaws). The next thing we need is to set a cookie in the logging in page- the cookie needs to base its information on the username, password, and whether or not it is valid. Now we set up the real code:

%%=======================
%% lm_app.erl
%%
%% @version 3
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").
-export([arg_rewrite/1, authenticate/2]).

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg", "/login_post.yaws" ].

arg_rewrite(Arg) ->
    io:fwrite("ARGUMENT REWRITING HIT!~n"),
    Arg.

authenticate(User, Password) ->
    if 
        User =:= "me" andalso Password =:= "password" -> ok;
        true -> false
    end.

We have now added the authenticate function. Obviously you want to stub this with your own authentication logic; for now it is filled with dummy data. The username is "me" and the password is "password". Make sure to compile lm_app.erl into lm_app.beam and load the module back into yaws (see prior blog posts about how to do this).

Somewhere you have to call this authenticate function. Here is the code from my new and improved login_post.yaws file:

<!--
%%=======================
%% login_post.yaws
%%
%% @version 1
%%=======================
-->

<erl>
    -include("lm_session.hrl").
    kv(K,L) -> { value, {K,V}} = lists:keysearch(K,1,L), V.

    out(A) ->
        L = yaws_api:parse_post(A),
        User = kv("uname", L),
        Passwd = kv("passwd", L),
        
        case lm_app:authenticate(User, Passwd) of
            ok ->
                 { html, f("Login Succeeds!", [])};
            false ->
                 { html, f("Login Fails!", [])}
        end.
</erl>

This one gave me trouble because the YAWS documentation is incorrect about how to use kv.

Now let's add cookie creation to the source code:

<!--
%%=======================
%% login_post.yaws
%%
%% @version 2
%%=======================
-->

<erl>
    -include("lm_session.hrl").
    kv(K,L) -> { value, {K,V}} = lists:keysearch(K,1,L), V.

    out(A) ->
        L = yaws_api:parse_post(A),
        User = kv("uname", L),
        Passwd = kv("passwd", L),
        
        case lm_app:authenticate(User, Passwd) of
            ok ->
                 S = #session{ user = User,
                               passwd = Passwd,
                               udata = [] },
                 Cookie = yaws_api:new_cookie_session(S),
                 [ {redirect_local, "/inside.yaws"}
                 , yaws_api:setcookie("lm_sid", Cookie) ];
            false ->
                 { html, f("Login Fails!", [])}
        end.
</erl>

The new purple code that was added will create a new cookie session and register it with our server. It doesn't do much past this. My inside.yaws looks like this:

<!--
%%=======================
%% inside.yaws
%%
%% @version 0
%%=======================
-->

<html>
    <body> Made it in! </body>
</html>

This portion of what this example is doing is just proving how redirect works. It's similar to the format of {html, "" } or {ehtml, ... } except in this case it just redirects the web browser. This process isn't finished yet. Before we continue, I have to mention that we require a couple more utility functions. These functions are actually listed in Chapter 7 of the YAWS documentation.

%%=======================
%% lm_app.erl
%%
%% @version 4
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").
-export([arg_rewrite/1, authenticate/2, 
         check_cookie/2, get_cookie_val/2]).

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg", "/login_post.yaws" ].

get_cookie_val(CookieName, Arg) ->
    H = Arg#arg.headers,
    yaws_api:find_cookie_val(CookieName, H#headers.cookie).



check_cookie(A, CookieName) ->
    case get_cookie_val(CookieName, A) of
        [] -> {error, "Not Logged In" };
        Cookie -> yaws_api:cookieval_to_opaque(Cookie)
    end.








arg_rewrite(Arg) ->
    io:fwrite("ARGUMENT REWRITING HIT!~n"),
    Arg.

authenticate(User, Password) ->
    if 
        User =:= "me" andalso Password =:= "password" -> ok;
        true -> false
    end.

The functions that are added are for convenience and are used to extract the opaque structure (listed in lm_session.hrl) from the incoming cookie data. These were taken directly from the YAWS documentation. 

How Blocking Pages Works (Part 2)

The bulk of the work is now finished with the exception that customers can still navigate to hidden pages. To fix this, we need to revisit argument rewriting from above. The logic will be that if a magic cookie has been set: this shows we have logged in and should have access to private files. If no cookie has been set the website should redirect back to the main page (index.yaws).

Let's begin by rewriting arg_rewrite so that it will do this redirect magic for us:

%%=======================
%% lm_app.erl
%%
%% @version 5
%%=======================
-module(lm_app).
-include("$YAWS_INCLUDES/yaws_api.hrl").
-export([arg_rewrite/1, authenticate/2, 
         check_cookie/2, get_cookie_val/2]).

login_pages() -> 
    [ "/index.yaws", "/lewismanor.jpg", "/login_post.yaws" ].

get_cookie_val(CookieName, Arg) ->
    H = Arg#arg.headers,
    yaws_api:find_cookie_val(CookieName, H#headers.cookie).



check_cookie(A, CookieName) ->
    case get_cookie_val(CookieName, A) of
        [] -> {error, "Not Logged In" };
        Cookie -> yaws_api:cookieval_to_opaque(Cookie)
    end.



do_rewrite(Arg) -> 
    Req = Arg#arg.req,
    { abs_path, Path } = Req#http_request.path,
    case lists:member(Path, login_pages()) of
        true -> Arg;
        false -> 
            Arg#arg{
                req = Req#http_request{
                    path = { abs_path, "/index.yaws" }
                },
                state = { abs_path, Path }
            }
    end.


arg_rewrite(Arg) ->
    OurCookieName = "lm_sid",
    case check_cookie(Arg, OurCookieName) of 
        {error, _} -> do_rewrite(Arg);
        {ok, _Session} -> Arg
    end.

authenticate(User, Password) ->
    if 
        User =:= "me" andalso Password =:= "password" -> ok;
        true -> false
    end.

That's all. To explain, any request incoming to the webserver goes to arg_rewrite. Our cookie name is "lm_sid", therefore that's what we are looking for. If when we check for the cookie it doesn't exist (meaning we don't have credentials) we rewrite the request. If the cookie does exist we pass along the request as usual (by passing the Argument out).

Rewriting the request is done in do_rewrite. The original request is obtained from the incoming Argument (Arg). From that request we can determine if the request is asking for a safe page (any page listed in login_pages()). If it is, pass along the request as usual (returning Arg). If it isn't legit we create a new Argument from the existing Arg (Arg#arg{ ... }) where the path redirects to index.yaws.

Conclusion

By the time all of this is set up, the website should be accessible. When trying to access a private page (i.e. inside.yaws), it will redirect you back to the main login page. If you login a cookie is set and you will be granted access to those private pages. Subsequent access to the site will allow you to side-step logging in because the cookie was set.

Hopefully this blog entry helps another developer out. The best advice I can give anyone embarking on this same course is to doubt the documentation. A couple times I got stuck because I typed what was in the examples verbatim; some of the examples are flawed and written for someone who is more intermediate.

No comments:

Post a Comment