Sunday, April 27, 2014

Pengenalan Keselamatan Web

-----

Internet and Worldwide Web

Internet adalah gabungan rangkaian komputer serata dunia. Gabungan ini melibatkan penggunaan berbagai jenis perkakasan dan perisian komputer yang dimiliki oleh syarikat telekomunikasi, organisasi awam dan orang persendirian. Melalui gabungan rangkaian ini, pelbagai perkhidmatan berasaskan pertukaran data dapat diwujudkan termasuklah Worldwide Web (Web).
Antara teknologi utama yang digunakan di dalam Internet ialah TCP/IP. TCP bermaksud Transfer Control Protocol. Ia merangkumi aspek pemindahan data dari satu peranti komunikasi (communication device) ke peranti yang lain. IP bermaksud Internet Protocol. Ia merangkumi aspek penentuan lokasi setiap komputer yang berada di dalam rangkaian Internet (atau mana2 rangkaian setempat yang menggunakan teknologi TCP/IP). Melalui teknologi TCP/IP, maklumat elektronik dapat dihantar dari satu komputer ke komputer yang lain dengan cekap meskipun jarak perhubungan ini sangat jauh.
Lihat klip video ini untuk memahami bagaimana teknologi TCP/IP berfungsi (http://www.youtube.com/watch?v=Ve7_4ot-Dzs)

Worldwide Web (Web)

Worldwide Web atau WEB adalah sejenis perkhidmatan perkongsian dokumen dan aplikasi elektronik melalui rangkaian Internet. Pertukaran data elektronik dilakukan dengan berdasarkan teknologi HTTP (Hyper Text Transfer Protocol). Apabila dua komputer berhubung melalui kaedah HTTP, kedua-duanya akan melalui kitaran yang dipanggil HTTP Request/Response

HTTP Request/Response Basics

HTTP Request/Response adalah sebahagian dari kitaran perhubungan di antara dua mesin yang berinteraksi menggunakan teknologi HTTP melalui rangkaian Internet. Maklumat lanjut tentangnya boleh dibaca di http://devhub.fm/http-requestresponse-basics/.
Ringkasnya:
1. Perhubungan bermula apabila pengguna melawat sesebuah tapak web melalui URL tapak tersebut, e.g google.com
2. Komputer pengguna akan menghantar HTTP Request untuk mendapatkan laman web di komputer google.com.
3. Komputer google.com menerima HTTP Request, lalu memprosesnya dan mengembalikan data yang diperlukan melalui HTTP Response.

Client-Server Architecture

Client-Server Architecture adalah satu senibina perhubungan komputer di mana peranan dua komputer yang sedang berinteraksi melalui rangkaian komputer (seperti Internet) ditakrif sebagai Client (komputer yang memohon data, iaitu HTTP Request) dan Server (komputer yang melayan permohonan data, iaitu HTTP Response).
Sesebuah komputer Client perlu mempunyai perisian Web Client seperti perisian Web Browser untuk membolehkannya menghantar isyarat HTTP Request dan memaparkan isyarat HTTP Response yang diterima dari Server.
Sesebuah komputer Server pula perlu mempunyai perisian Web Server seperti Apache, Microsoft IIS atau Apple OS X untuk membolehkannya menerima isyarat HTTP Request, memproses permohonan dan memulangkan isyarat HTTP Response ke komputer Client.

Sejarah Web Browsers

Terdapat berbagai perisian Web Browser yang digunakan pada hari ini. Sejarah perisian ini bermula pada tahun 1990 pada Tim Berners Lee memperkenalkan teknologi Web. Lihat infografik berikut untuk mengikuti perjalanan sejarah Web Browser.

Sejarah Web Server

Web Server pertama di dunia terletak di URL http://info.cern.ch/. Jika anda melawatnya, anda berpeluang membaca kisah tentangnya termasuklah komputer pertama (komputer NEXT) yang menjadi hos perkhidmatan ini.
Perisian Web Server yang pertama dikenali sebagai CERN HTTPD (http://en.wikipedia.org/wiki/CERN_httpd), kemudiannya diikuti oleh NCSA HTTPD (http://en.wikipedia.org/wiki/NCSA_HTTPd). Pembangunan perisian NCSA HTTPD diteruskan oleh projek Apache (http://en.wikipedia.org/wiki/Apache_HTTP_Server). Projek Apache terus berkembang sehingga sekarang. Pada tahun 2009, ia telah mencatat rekod melepasi jumlah penggunaan oleh 100 juta tapak web.
Antara perisian Web Server terawal selain Apache adalah Netscape Enterprise Server 1994 (http://en.wikipedia.org/wiki/Netscape_Enterprise_Server) yang kini dikenali sebagai Oracle iPlanet Web Server (http://en.wikipedia.org/wiki/Oracle_iPlanet_Web_Server). Syarikat perisian gergasi Microsoft turut mempunyai perisian Web Server sendiri yang dipanngil Internet Information Service, IIS yang mula diedar melalui perisian Sistem Operasi NT seawal tahun 1995 (http://en.wikipedia.org/wiki/Internet_Information_Services).(Lihat juga imbasan sejarah syarikat Microsoft terhadap pembabitan mereka di dalam Web, http://www.microsoft.com/misc/features/features_flshbk.htm)
Pada hari ini terdapat berbagai produk perisian Web Server (lihat http://en.wikipedia.org/wiki/Comparison_of_web_server_software).
Apache masih kekal popular dan terdapat berbagai jenis produk variant yang dihasilkan daripadanya. Tutorial dalam dokumen ini juga akan menggunakan perisian Apache yang diedarkan olehhttp://www.usbwebserver.net/en/

Persekitaran Pembangunan Aplikasi Web

Pembangunan Aplikasi Web merujuk kepada aktiviti membangunkan perisian yang dicapai melalui persekitaran Web. Oleh kerana bahasa asal Web iaitu HTML adalah terhad kepada paparan maklumat, bahasa pengaturcaraan khas diperlukan untuk menjalankan aktiviti pemprosesan maklumat yang akan dijalankan oleh Server Machine. Antara bahasa pengaturcaraan popular pada hari ini termasuklah PHP (PHP Hypertext Processor), ASP (Active Server Page) dan JSP (Java Server Page). Ketiga-tiga bahasa ini mempunyai sejarah dan komuniti yang tersendiri.
PHP pernah merekodkan catatan sebagai bahasa pengaturcaraan yang digunakan oleh sejumlah 20 juta Internet Domain pada bulan April 2007. PHP juga turut dicatatkan sebagai penyumbang kepada 30% Web Vulnerabilities. Antara punca yang dikenalpasti adalah kelalaian pengaturcara di samping ralat teknikal pada bahasa tersebut.


Kategori Web Vulnerabilities

1) Kegagalan Proses Penentusahan (Authentication), 62% - Kegagalan pada mekanisma login yang membolehkan penyerang mengesan katakunci yang lemah, melancarkan serangan bertubi-tubi (brute force) atau melangkaui sekatan login.
2) Kegagalan Kawalan Capaian (Access Control), 71 % - Kegagalan mengawal capaian sumber mengakibatkan penceroboh dapat memperolehi maklumat atau memperolehi status Pengguna Pentadbir (Admin User).
3) Injeksi SQL (SQL Injection), 32% - Penyerang berjaya mengubahsuai kod input sehingga mengakibatkan sistem hujung-belakang (back-end) memproses input yang memberi kelebihan kepadanya samada untuk memperolehi atau membuat pemprosesan data.
4) Serangan Skrip Silang-Tapak (Cross-Site Scripting), 94% - Penyerang mensasarkan pengguna lain, mencapai data mereka, bertindak di bawah identiti mereka atau menyerang mereka.
5) Kebocoran Maklumat, 78% - Aplikasi membocorkan maklumat akibat kegagalan sistem menangani ralat atau apa-apa insiden lain.
6) Pemalsuan Silang-Tapak (Cross-Site forgery), 92% - Penyerang memperdaya pengguna menggunakan skrip di tapaknya untuk berinteraksi dan mencapai akses aplikasi mangsa.

Software Developer's Guide to HTTP

copied from: http://odetocode.com/Articles/741.aspx
-----

-----
HTTP is the protocol that enables us to buy microwave ovens from Amazon.com, reunite with an old friend in a Facebook chat, and watch funny cat videos on YouTube. HTTP is the protocol behind the World Wide Web. It allows a web server from a datacenter in the United States to ship information to an Internet Café in Australia, where a young student can read a web page describing the Ming dynasty in China.

In this series of articles, we'll look at HTTP from a software developer's perspective. Having a solid understanding of HTTP can help you write better web applications and web services. It can also help you debug applications and services when things go wrong. We'll be covering all the basics including resources, messages, connections, and security as it relates to HTTP.

1) Resources, http://odetocode.com/Articles/741.aspx

2) Messages, http://odetocode.com/Articles/742.aspx

3) Connections, http://odetocode.com/Articles/743.aspx

4) Web Architecture, http://odetocode.com/Articles/744.aspx

5) State & Security, http://odetocode.com/Articles/745.aspx

Using Postman HTTP client to help test web services

-----

-----
Postman is a powerful HTTP client to help test web services easily and efficiently. Postman lets you craft simple as well as complex HTTP requests quickly. It also saves requests for future use so that you never have to repeat your keystrokes ever again. Postman is designed to save you and your team tons of time. Check out more features below or just install from the Chrome Web Store to get started.
-----

Download HTTP Test Application:

1) http://www.getpostman.com/

What really happens when you navigate to a URL

copied from: http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/
-----

-----
As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.
In this article, we will take a deeper look at the sequence of events that take place when you visit a URL.

1. You enter a URL into the browser

It all starts here:
image

2. The browser looks up the IP address for the domain name

 image
The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:
  • Browser cache – The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes).
  • OS cache – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache.
  • Router cache – The request continues on to your router, which typically has its own DNS cache.
  • ISP DNS cache – The next place checked is the cache ISP’s DNS server. With a cache, naturally.
  • Recursive search – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary.
Here is a diagram of what a recursive DNS search looks like:
500px-An_example_of_theoretical_DNS_recursion_svg 
One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:
  • Round-robin DNS is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses.
  • Load-balancer is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers.
  • Geographic DNS improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state.
  • Anycast is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario.
Most of the DNS servers themselves use anycast to achieve high availability and low latency of the DNS lookups.

3. The browser sends a HTTP request to the web server

image
You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).
So, the browser will send this request to the Facebook server:
GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]
The GET request names the URL to fetch: “http://facebook.com/”. The browser identifies itself (User-Agent header), and states what types of responses it will accept (Accept and Accept-Encodingheaders). The Connection header asks the server to keep the TCP connection open for further requests.
The request also contains the cookies that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.
There is a variety of tools that let you view the raw HTTP requests and corresponding responses. My favorite tool for viewing the raw HTTP traffic is fiddler, but there are many other tools (e.g., FireBug) These tools are a great help when optimizing a site.
In addition to GET requests, another type of requests that you may be familiar with is a POST request, typically used to submit forms. A GET request sends its parameters via the URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). A POST request sends its parameters in the request body, just under the headers.
The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.

4. The facebook server responds with a permanent redirect

image
This is the response that the Facebook server sent back to the browser request:
HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
      pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
      path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0
The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”.
There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.
One reason has to do with search engine rankings. See, if there are two URLs for the same page, say http://www.igoro.com/ and http://igoro.com/, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.
Also, multiple URLs for the same content are not cache-friendly. When a piece of content has multiple names, it will potentially appear multiple times in caches.

5. The browser follows the redirect

image
The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:
GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com
The meaning of the headers is the same as for the first request.

6. The server ‘handles’ the request

image
The server will receive the GET request, process it, and send back a response.
This may seem like a straightforward task, but in fact there is a lot of interesting stuff that happens here – even on a simple site like my blog, let alone on a massively scalable site like facebook.
  • Web server softwareThe web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response.
    In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example http://example.com/folder1/page1.aspx URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could behttp://example.com/folder1/page1.
  • Request handlerThe request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response.
One interesting difficulty that every dynamic website faces is how to store data. Smaller sites will often have a single SQL database to store their data, but sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics.
One technique to keep data updates cheap is to defer some of the work to a batch job. For example, Facebook has to update the newsfeed in a timely fashion, but the data backing the “People you may know” feature may only need to be updated nightly (my guess, I don’t actually know how they implement this feature). Batch job updates result in staleness of some less important data, but can make data updates much faster and simpler.

7. The server sends back a HTML response

image
Here is the response that the server generated and sent back:
HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
    pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3
��������T�n�@����[...]
The entire response is 36 kB, the bulk of them in the byte blob at the end that I trimmed.
The Content-Encoding header tells the browser that the response body is compressed using the gzip algorithm. After decompressing the blob, you’ll see the HTML you’d expect:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"   
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" 
      lang="en" id="facebook" class=" no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...
In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc.
Notice the header that sets Content-Type to text/html. The header instructs the browser to render the response content as HTML, instead of say downloading it as a file. The browser will use the header to decide how to interpret the response, but will consider other factors as well, such as the extension of the URL.

8. The browser begins rendering the HTML

Even before the browser has received the entire HTML document, it begins rendering the website:
 image

9. The browser sends requests for objects embedded in HTML

image
As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files.
Here are a few URLs that my visit to facebook.com retrieved:
  • Imageshttp://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
    http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
  • CSS style sheetshttp://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
    http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
  • JavaScript files
    http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
    http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.
However, static files – unlike dynamic pages – allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.
Can you guess what “fbcdn.net” in the URLs stands for? A safe bet is that it means “Facebook content delivery network”. Facebook uses a content delivery network (CDN) to distribute static content – images, style sheets, and JavaScript files. So, the files will be copied to many machines across the globe.
Static content often represents the bulk of the bandwidth of a site, and can be easily replicated across a CDN. Often, sites will use a third-party CDN provider, instead of operating a CND themselves. For example, Facebook’s static files are hosted by Akamai, the largest CDN provider.
As a demonstration, when you try to ping static.ak.fbcdn.net, you will get a response from an akamai.net server. Also, interestingly, if you ping the URL a couple of times, may get responses from different servers, which demonstrates the load-balancing that happens behind the scenes.

10. The browser sends further asynchronous (AJAX) requests

image
In the spirit of Web 2.0, the client continues to communicate with the server even after the page is rendered.
For example, Facebook chat will continue to update the list of your logged in friends as they come and go. To update the list of your logged-in friends, the JavaScript executing in your browser has to send an asynchronous request to the server. The asynchronous request is a programmatically constructed GET or POST request that goes to a special URL. In the Facebook example, the client sends a POST request to http://www.facebook.com/ajax/chat/buddy_list.php to fetch the list of your friends who are online.
This pattern is sometimes referred to as “AJAX”, which stands for “Asynchronous JavaScript And XML”, even though there is no particular reason why the server has to format the response as XML. For example, Facebook returns snippets of JavaScript code in response to asynchronous requests.
Among other things, the fiddler tool lets you view the asynchronous requests sent by your browser. In fact, not only you can observe the requests passively, but you can also modify and resend them. The fact that it is this easy to “spoof” AJAX requests causes a lot of grief to developers of online games with scoreboards. (Obviously, please don’t cheat that way.)
Facebook chat provides an example of an interesting problem with AJAX: pushing data from server to client. Since HTTP is a request-response protocol, the chat server cannot push new messages to the client. Instead, the client has to poll the server every few seconds to see if any new messages arrived.
Long polling is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.

Conclusion

Hopefully this gives you a better idea of how the different web pieces work together.
----

Web application security

copied from: http://en.wikipedia.org/wiki/Web_application_security

Web application security

From Wikipedia, the free encyclopedia
Web application security' is a branch of Information Security that deals specifically with security of websitesweb applications and web services.
At a high level, Web application security draws on the principles of application security but applies them specifically to Internet and Web systems. Typically web applications are developed using programming languages such as PHPJava EEJavaPythonRubyASP.NETC#VB.NET or Classic ASP.

Security threats[edit]

With the emergence of Web 2.0, increased information sharing through social networking and increasing business adoption of the Web as a means of doing business and delivering service, websites are often attacked directly. Hackers either seek to compromise the corporate network or the end-users accessing the website by subjecting them to drive-by downloading.[1][2]
As a result, industry[3] is paying increased attention to the security of the web applications[4] themselves in addition to the security of the underlying computer network andoperating systems.
The majority of web application attacks occur through cross-site scripting (XSS) and SQL injection attacks[5] which typically result from flawed coding, and failure to sanitize input to and output from the web application. These are ranked in the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors.[6] According the security vendor Cenzic, the top vulnerabilities in March 2012 include:[7]

Security standards[edit]

OWASP is the emerging standards body for Web application security. In particular they have published the OWASP Top 10 which describes in detail the major threats against web applications. The Web Application Security Consortium (WASC) has created the Web Hacking Incident Database[8] and also produced open source best practice documents on Web application security.

Security technology[edit]

While security is fundamentally based on people and processes, there are a number of technical solutions to consider when designing, building and testing secure web applications. At a high level, these solutions include:

See also[edit]