HTML Primer

What other HTML books & tutorials just skip

In this tutorial you will learn what HTML is and what its underlying principles are. HTML tutorials usually start off with examples on how to create a colourful web page. This can be fun, but two things fall short in that way of teaching a subject: If you want to understand something you first of all should know what it is and is not, and second it can help a lot if you konw at least something about the principles that make it work.

Lucky you! You found the page that closes this gap. Knowing the underlying concepts will help you to avoid mistakes and to solve problems rather than fumbling around in Trial & Error style.

The following statements summarize what you should know for this, and all of them become clear as you read this page.

  1. HTML is not Web design; not at all.
  2. HTML structures documents.
  3. An HTML page consists of HTML elements.
  4. Devices and software for reading HTML (web browsers) have an inbuilt mechanism for displaying (rendering) HTML elements in a useful way.
  5. In so-called Cascading Style Sheets (CSS, not subject of this tutorial) a web designer can further specify the display of the HTML elements. Note that, because of the bullet point above, you don't need to worry about CSS now.
  6. Web browsers may render the HTML as they want: They may ignore CSS; they may render a page completely different than you expect; or they may allow the user to change colors, fonts, etc. at their convenience. They may even read the page content out loud.

Now you will learn what the idea behind HTML is.

What HTML stands for

The acronym HTML stands for Hypertext Markup Language.

Hypertext is text, displayed on a computer, that includes references to other documents. The user can easily follow these references (links). On an end device like a PC, a smartphone, or a tablet you usually just need to click on a link to get to the referenced document. There may be other devices that apply different mechanisms.

A Markup Language is used to mark elements of a document. It is used to tag the individual parts of a document. Tagging the elements of documents is not only used for web pages. Let us leave the Internet/Web topic and go for an easy to understand example: A newspaper publisher.

Where the need for HTML comes from

Chaotic Documents

To keep the example simple and clear we assume a simplified publishing process: The reporters just send their articles to a team that handles the layout and printing of the newspaper. At the beginning we assume the layout and typesetting are done by hand, i.e. without computers and machines.

Let's look at two articles sent in by two different reporters:

Article 1

San Francisco/CA/US, Jan 19, 2013: Famous hairs in Avalon Ballroom
Angela Planck, cleaning lady in the Avalon Ballroom (Sutter Street), was lucky when working in the ladies' restroom this morning. She found a dirty, old comb and put it in her pocket for throwing it away later. Fortunately she forgot it until she arrived at home. Her son, a young gene researcher, thought it would be interesting to find traces of celebrities. The result of his investigations: Janis Joplin and Steve Miller had been using this comb in the late 1960s. Eighteen of Janis' beautiful hairs and fourteen of the young Steve are on auction now. The starting price for each Janis hair is $16.000, Steve's are a bit cheaper ($4.000 each).

Article 2

In a speech held in front of the Porta Nigra in Trier (Germany) Host Schlemmer described alternative ways out of the current economic challenges. He proposed to replace the Euro currency by a new currency to be named Smart. It should use the hash symbol (#). He mentioned that after 12 years the time for a rebranding would be right. Moving to a one-syllable word (Smart) and a new sign (#) that is available on every keyboard has a high cost saving potential: 300.000 working hours per year are currently wasted by people searching for the Euro sign (€) on their keyboard.
Article by D. Getz
2013-01-19 23:11

Individual Formatting

The team responsible for the layout and printing process will make sure the readers get the information they need, presented in a consistent layout. When preparing the newspaper they want to easily find all the information like date, place, author, and the message itself.

So the layout/printing department defines a format for the reporters. They are asked to send all articles in the following format:

Title: ...
Place: ...
Date: ...
Message: ...
Author: ...

Provided the reporters adhere to this format, their colleagues can easily identify the information they need and incorporate it in the layout. The articles above delivered in this format look as follows:

Article 1

Title: Famous hairs in Avalon Ballroom
Place: San Francisco/CA/US
Date: 2013-01-19
Message: Angela Planck, cleaning lady in the Avalon Ballroom (Sutter Street), was lucky when working in the ladies' restroom this morning. She found a dirty, old comb and put it in her pocket for throwing it away later. Fortunately she forgot it until she arrived at home. Her son, a young gene researcher, thought it would be interesting to find traces of celebrities. The result of his investigations: Janis Joplin and Steve Miller had been using this comb in the late 1960s. Eighteen of Janis' beautiful hairs and fourteen of the young Steve are on auction now. The starting price for each Janis hair is $16.000, Steve's are a bit cheaper ($4.000 each).
Author: J. Gurley

Article 2

Title: Schlemmer proposes to replace Euro by Smart
Place: Trier/DE
Date: 2013-01-19
Message: In a speech held in front of the Porta Nigra in Trier (Germany) Host Schlemmer described alternative ways out of the current economic challenges. He proposed to replace the Euro currency by a new currency to be named Smart. It should use the hash symbol (#). He mentioned that after 12 years the time for a rebranding would be right. Moving to a one-syllable word (Smart) and a new sign (#) that is available on every keyboard has a high cost saving potential: 300.000 working hours per year are currently wasted by people searching for the Euro sign (€) on their keyboard.
Author: D. Getz

Receiving the articles in that format the team has no longer to search for date, place, author etc. They can apply easily their layout standard, i.e. put the title in bold letters, then the place and the initials at the beginning of the article, etc.

Machine Readable Markup

So far we assumed that only human beings have to digest the information sent in by the reporters. But we all know that the articles will be delivered electronically, and that a computer will extract the information rather than a person.

The format we have defined above is clear for a person, but error prone when fed into a software. Imagine the news article contains the text phrase “Date: ” somewhere in the middle of the Message part. The software might parse this as “here comes the date” and interpret the text behind the “Date: ” as the publishing date, although it might be a different date or not a date at all.

To help stupid computers to parse our article correctly we have to amend the format. We should clearly indicate the start and end of the different pieces of information. We define the following marks (or, let's call them tags now):

Tag Meaning
<title> start of the title
</title> end of the title
<place> start of the place information
</place> end of the place information
<date> start of the date information
</date> end of the date information
<author> start of the author's name
</author> end of the author's name
<message> start of the actual news message
</message> end of the actual news message

Now article 1 will be sent like this:

<title>Famous hairs in Avalon Ballroom</title>
<place>San Francisco/CA/US</place>
<date>2013-01-19</date>
<message>Angela Planck, cleaning lady in the Avalon Ballroom (Sutter Street), was lucky when working in the ladies' restroom this morning. She found a dirty, old comb and put it in her pocket for throwing it away later. Fortunately she forgot it until she arrived at home. Her son, a young gene researcher, thought it would be interesting to find traces of celebrities. The result of his investigations: Janis Joplin and Steve Miller had been using this comb in the late 1960s. Eighteen of Janis' beautiful hairs and fourteen of the young Steve are on auction now. The starting price for each Janis hair is $16.000, Steve's are a bit cheaper ($4.000 each).</message>
<author>J. Gurley</author>

This looks like it can easily be parsed electronically. We can imagine a software that cuts out the text between <title> and </title> and adds it in large letters to the newspaper. Then it may extract the text between <author> and </author> and place it below the title. In a similar way the software could extract the place, date, and the message itself, and add them to the newspaper.

Is this HTML?

The above looks a bit like the HTML tags you may have seen or heard of. It is not HTML yet. Let's see what we have done:

We have agreed on a format that allows the reporters to mark the different parts (elements) of a news item (title, place, date, message, author). For the marks we introduced start and end tags for each element, applying a notation with angle brackets: <element_name> for start tags and </element_name> for end tags. This should allow electronic parsers to precisely extract the information between the tags.

This style of tagging stems from a language called SGML. In fact, what we did here - defining element names at our convenience - is a typical SGML (or XML) exercise. HTML is different. The beauty of HTML is that we do not need to define any elements ourselves. Read on!

One Format Fits All

The format defined above has one downside: It can be used only for news or similar documents, i.e. for documents that have a title, date, place, message, and author.

If we wanted to publish something else in our newspaper, for example an event calendar, small adds, or advertisements, the format would not fit.

Apart from that we do not want to print a newspaper, but we want to make our information available on the Web. (You are happy to read this, are you?) So we need a format that every reader can deal with, i.e. a format that is incorporated in all tools for reading web pages in the world.

And this general markup is HTML.

HTML has a fixed set of elements, marked by start and end tags. If you want publish a web page, you must use only those tags. Sounds like a limitation, but see the benefit: You can be sure that the reader's browser is aware of your markup and will display your valuable information in a usable way.

Below table shows a short excerpt of the HTML tags and for what they are used.

TagMeaning
<h1> A top level header starts (h1 = header 1st level)
</h1> A top level header ends
<h2> A 2nd level header starts
</h2> A 2nd level header ends
<p> Start of a paragraph element
</p> End of a paragraph element

Note that HTML does not say how these elements get displayed. It is up to the end device to do something useful with it. Usually web browsers display the elements in the order they occur in the document. They use a very large font for the top header (the h1 element) and for headers at lower levels accordingly smaller fonts, and for paragraphs a font size that is still easy to read.

You may wonder: But how do I now specify what the author, the date or place for an article is? The answer: There are ways to incorporate such information into HTML, but this something you learn as an advanced HTML editor.

Now you know better

Congrats! If you understood the above you know more about web technology than most of today's web designers and even more than some web application developers.

Now you only need to know what tags are available and what exactly you can do with them. The German version of this tutorial covers a bit more. In the World Wide Web you can find plenty of good, bad, and wrong tutorials. The HTML Tutorial at W3Schools looks good.

If anything on this page is wrong, not clear, or if you are just happy to have found this page, just let me know via email.

© Hermann Faß, 2013