Wednesday, October 17, 2012

SVN trunk, branches and tags


In this post, I provide details about how I personnaly handle SVN trunk, branches and tags. This approach is also known as “branch always”, with minor differences. This might not be the best approach, but it will give beginners some explanations on what trunk, branches and tags are, and how to handle them.

Of course, feel free to leave comments on this post if some clarifications are needed, if some mistakes were made, or if you disagree with my statements.

An easy comparison

Working with SVN is somewhat like growing a tree:
  • a tree has a trunk and some branches
  • branches grow from the trunk, and thinner branches grow from thicker branches
  • a tree can grow with a trunk and no branch (but not for long)
  • a tree with branches but no trunk looks more like a bundle of twigs fallen on the floor
  • if the trunk is sick, so are the branches and eventually, the whole tree can die
  • if a branch is sick, you can cut it, and another one may grow
  • if a branch grows too much, it may become too heavy for the trunk, and the tree will fall down
  • when you feel your tree, your trunk, or a branch is nice looking, you can take a picture of it to remember how nice it was that day

Trunk

With the “branch always” approach, the trunk is the main place were stable code can be found. This is like the assembly line of a car factory, putting finished car parts together.
Here is how you should deal with a SVN trunk:
  • do NEVER work directly on the trunk, unless you have to deal with a bug which is quick and easy to fix (a few characters), or if you have to ADD a few files which hold no logic (like media files: images, videos, css, etc)
  • do not make too many exceptions to the previous statement, those are really special cases, every other situation must imply the creation of a branch (see below)
  • do not commit changes (from a branch merge) to the trunk which may break it
  • if at some point you happen to break the trunk, bring some cake the next day (“with great power comes… huge cake”)

Branches

A branch is a “cheap copy” of a subtree (ie, the trunk or a branch) of a SVN repository. It works a little bit like symbolic links on UNIX systems, except that once you make modifications to files within a SVN branch, these files evolve independently from the original files which were “copied”. When a branch is completed and considered stable, it must be merged back to its original copy, that is: back to the trunk if it was copied from the trunk, or back to its parent branch if it was copied from a branch.
Here is how you should deal with SVN branches:
  • if you need to alter your application or develop a new feature for it, create a new branch from the trunk, and make your development on that branch
  • new branches must be created from the trunk, except for new sub-branches (if needed) which must be created from a branch
  • when you create a new branch, you should switch to it immediately; if you don’t, why did you create the branch in the first place?

Tags

Internally, SVN branches and SVN tags are the same thing, but conceptually, they differ a lot.
Remember the “take a picture of the tree” thing written earlier? Well, this is exactly what a SVN tag is: a snapshot with a name of a specific revision of the trunk, or of a branch.
Here is how you should deal with SVN tags:
  • as a developer, do never switch to/checkout from/commit to a SVN tag: a tag is some sort of “picture”, not the real thing; tags are to be read, never written
  • on specific/critical environments (production, staging, testing, etc), checkout and update from a fixed tag, but do never commit to a tag
  • for the aforementioned environments, create tags with names like “production”, “staging”, “testing”, etc. You can also tag by sofware version and/or by project maturity: “1.0.3″, “stable”, “latest”, etc.
  • when the trunk is stable and ready to be released publicly, re-create tags accordingly, then update the concerned environments (production, staging, etc)
  • do not tag a tag

Example workflow

Say you have to add a feature to a project under version control. Here are the steps you should achieve to do so:
  • get a new working copy of the project (through a SVN checkout or a SVN switch) from the trunk
  • create a new SVN branch and give it a name which allows to understand what it is all about (say, “feature-faq-development”)
  • SVN switch to the new branch (“/branches/feature-faq-development”)
  • make the needed development to complete the new feature (and of course, make a lot of tests, even before you start coding), commit sub-parts of your development when needed
  • once the feature is complete and stable (and committed), merge the trunk into the branch and resolve conflicts if there are some, then commit your changes
  • with the approval of your peers, switch to the trunk
  • merge your branch within your working copy (trunk), and resolve conflicts if there are some
  • re-check your development with the merged code
  • if possible, ask one of your peer to do a code review of your changes with you
  • commit your merged working copy to the trunk
  • if some deployment must be achieved on specific environments (production, etc), update the related tag to the revision you just committed in the trunk
  • deploy on the concerned environments with a SVN update
  • rename the branch so that it’s made clear it won’t be used anymore (“/branches/obsolete-feature-faq-development”)
  • eventually, delete the branch after a while
Extra resources:

Reference:
http://blog.jmfeurprier.com/2010/02/08/svn-trunk-branches-and-tags/

Tuesday, October 16, 2012

PHPMailer is a PHP email transport class

PHPMailer is a PHP email transport class featuring file attachments, SMTP servers, CCs, BCCs, HTML messages, word wrap, and more. Sends email via sendmail, PHP mail(), QMail, or directly with SMTP. Support for additional transports, such as SMS, MMS will be forthcoming

Reference:
http://code.google.com/a/apache-extras.org/p/phpmailer/

Tuesday, October 2, 2012

Web Performance Tuning Tips Solutions for Drupal Sites

Web Performance Tuning Tips Solutions for Drupal Sites

[] HAST + CARP + ZFS - High Availability HA cluster solution for FreeBSD.

HAST (Highly Available Storage) - allows to transparently store data on two physically separated machines connected over the TCP/IP network.

CARP (Common Address Redundancy Protocol) - allows multiple hosts to share the same IP address. In some configurations, this may be used for availability or load balancing. Hosts may use separate IP addresses as well, as in the example provided here.

http://www.freebsd.org/doc/handbook/carp.html
http://forums.freebsd.org/showthread.php?t=17133
http://blather.michaelwlucas.com/archives/241

[] MySQL Innodb storage engine.

[] ZFS file system - ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity (protection against bit rot, etc), support for high storage capacities, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

[] NFS (Network File System) - is a network file system protocol originally developed by Sun Microsystems in 1984,[1] allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard defined in RFCs, allowing anyone to implement the protocol.

[] Apache MPM Worker (instead of pre-fork, prefork) + fcgid + APC + memcached
Go for a threaded server, and PHP as an fcgid.

Note: APC does not currently share its cache between multiple php-cgi workers running under fastcgi or fcgid. See this feature request for details: "this behaviour is the intended one as of now".

Note: If you want to go for a threaded server, you must use Apache MPM Worker instead of Prefork; otherwise thread will not be enabled. Use httpd -V to check current server status.

Note: Keep in mind that it's impossible to share APC cache between mod_fcgid PHP-CGI instances. You have to use PHP-FPM for that. mmap_file_mask doesn't help when it comes to mod_fcgid. You can check by lsof -p PHP-PID and see that different /tmp/apc.* are memory mapped in different PHP processes. So you may be served by one PHP process and see empty APC cache by checking another PHP process - they have separate caches. Try FcgidMaxProcesses 1 and see if APC cache is still empty.

How to share APC cache between several PHP processes when running under FastCGI?

FastCGI with a PHP APC Opcode Cache

There are articles on 2bits.com:
http://groups.drupal.org/node/27174
http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better-resource-utilization.html

[] Nginx - is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004. Nginx now hosts nearly 6.55% (13.5M) of all domains worldwide.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.

[] php-fpm (FastCGI Process Manager) - is an alternative PHP FastCGI implementation with some additional features useful for sites of any size.

[] Nginx + php-fpm + apc = Awesome

[] Disable the Apache modules that you don't need.

[] APC (Alternative PHP Cache) - Alternative PHP Cache (APC) is a free, open source framework that optimizes PHP intermediate code and caches data and compiled code (opcode code) from the PHP bytecode compiler in shared memory. APC is quickly becoming the de-facto standard PHP caching mechanism as it will be included built-in to the core of PHP starting with PHP 6.

[] Memcached - Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.

[] Using MySQL with memcached
Chapter 14. High Availability and Scalability:

14.1. Using MySQL with DRBD
14.2. Using Linux HA Heartbeat
14.3. MySQL and Virtualization
14.4. Using ZFS Replication
14.5. Using MySQL with memcached
14.6. MySQL Proxy

[] HAProxy - High Performance TCP/HTTP Load Balancer

[] Varnish - an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, many of which began life as client-side proxies or origin servers, Varnish was designed from the ground up as an HTTP accelerator.

[] MySQL memory type table (it is good and fast for the SQL select statement selects from memory type table)

[] MySQL proxy can be used to separate insert / select statements. MySQL Proxy is a simple program that sits between your client and MySQL server(s) that can monitor, analyze or transform their communication. Its flexibility allows for unlimited uses; common ones include: load balancing, failover, query analysis, query filtering, modification, and read/write (select and insert) splitting

[] Nginx - a HTTP and reverse proxy server, as well as a mail proxy server written by Igor Sysoev. It has been running for more than five years on many heavily loaded Russian sites including Rambler (RamblerMedia.com). According to Netcraft nginx served or proxied 4.70% busiest sites in April 2010.

[] LVS (Linux Virtual Server) - The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the server cluster is fully transparent to end users, and the users interact as if it were a single high-performance virtual server.

[] Apache Solr - Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

[] Drupal - APC module - This module is only available for Drupal 7 because it has a new cache implementation. It allows using different backends for the types of caches. For example you could cache 'cache' and 'cache_bootstrap' in APC; 'cache_field' and 'cache_menu' in Memcached and store 'cache_filter' in the database.

[] Drupal - memcache module - An API for using Memcached and the PECL Memcache library with Drupal.

[] Drupal - boost module - Boost provides static page caching for Drupal enabling a very significant performance and scalability boost for sites that receive mostly anonymous traffic. Boost is very easy to install and has been throughly tested on Shared, VPS and Dedicated hosting. Apache is fully supported, with Nginx, Lighttpd and IIS 7 semi-supported. Boost will cache & gzip compress html, xml, ajax, css, & javascript. Boosts cache expiration logic is very advanced; it's fairly simple to have different cache lifetimes for different parts of your site. The built in crawler makes sure expired content is quickly regenerated for fast page loading. For shared hosting this is your best option in terms of improving performance.

[] Drupal - varnish module - This module provides integration between your Drupal site and the Varnish HTTP Accelerator, an advanced and very fast reverse-proxy system. Basically, Varnish handles serving static files and anonymous page-views for your site much faster and at higher volumes than Apache, in the neighborhood of 3000 requests per second.

[] Drupal - Cache Router module - CacheRouter is a caching system for Drupal allowing you to assign individual cache tables to specific cache technology. CacheRouter has an option to utilize the page_fast_cache part of Drupal in order to reduce the amount of resources needed for serving pages to anonymous users.

[] Drupal - Authcache - The Authcache module offers page caching for both anonymous users and logged-in authenticated users. This allows Drupal/PHP to only spend 1-2 milliseconds serving pages, greatly reducing server resources.

=============== [START] Database ================
[] MySQL - is a relational database management system (RDBMS)[1] that runs as a server providing multi-user access to a number of databases. MySQL is officially pronounced ("My S-Q-L"),[2] but is often also pronounced ("My Sequel"). It is named after developer Michael Widenius' daughter, My. The SQL phrase stands for Structured Query Language.[3]

The MySQL development project has made its source code available under the terms of the GNU General Public License, as well as under a variety of proprietary agreements. MySQL was owned and sponsored by a single for-profit firm, the Swedish company MySQL AB, now owned by Oracle Corporation.[4]

Members of the MySQL community have created several forks (variations) such as Drizzle, OurDelta, Percona Server, and MariaDB. All of these forks were in progress before the Oracle acquisition; Drizzle was announced eight months before the Sun acquisition.

[] PostgreSQL - often simply Postgres, is an object-relational database management system (ORDBMS).[4] It is released under an MIT-style license and is thus free and open source software. As with many other open-source programs, PostgreSQL is not controlled by any single company — a global community of developers and companies develops the system.

[] HBase - is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper [1]. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST or Thrift gateway APIs.

HBase is not a direct replacement for a classic SQL Database, although recently its performance has improved, and it is now serving several data-driven websites[2][3], including Facebook's Messaging Platform[4].

[] Apache Hadoop

[] MongoDB - (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language.[1]

MongoDB combines the functionality of key-value stores - which are fast and highly scalable - and traditional RDBMS systems - which provide rich queries and deep functionality. It is designed for problems that are difficult to be solved by traditional RDBMSs, for example databases spanning many servers.

The database is document-oriented so it manages collections of JSON-like documents. Many applications can, thus, model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.

[] NoSQL - In computing, NoSQL is a term used to designate database management systems that differ from classic relational database management systems in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage,[1][2][3][4] a term that would include classic relational databases as a subset.

Notable production implementations include Google's BigTable, Amazon's Dynamo and Apache Cassandra.

[] Cassandra - is an open source distributed database management system. It is an Apache Software Foundation top-level project[1] designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature.[2] Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.[3]

[] Apache CouchDB - Apache CouchDB, commonly referred to as CouchDB, is a free web scale open source document-oriented database written in the Erlang programming language. It is a NoSQL product designed for local replication and to scale vertically along a wide range of devices. CouchDB is supported by commercial enterprises CouchOne and Cloudant.

=============== [END] Database ================

Reference:
http://gala4th.blogspot.com/2010/11/nginx-vs-haproxy-vs-lvs.html

scaling drupal - an open-source infrastructure for high-traffic drupal sites

scaling drupal step four - database segmentation using mysql proxy

http://cruncht.com/89/drupal-lamp-server-tuning

http://groups.drupal.org/node/146864

High Availability HA cluster solution for FreeBSD

High Availability HA cluster solution for FreeBSD

HAST + CARP + ZFS

HAST (Highly Available Storage) - allows to transparently store data on two physically separated machines connected over the TCP/IP network.

CARP (Common Address Redundancy Protocol) - allows multiple hosts to share the same IP address. In some configurations, this may be used for availability or load balancing. Hosts may use separate IP addresses as well, as in the example provided here.

http://www.freebsd.org/doc/handbook/carp.html
http://forums.freebsd.org/showthread.php?t=17133
http://blather.michaelwlucas.com/archives/241

Web Server Load Balancing

Web Server Load Balancing:
Load balancing becomes important if your traffic volume becomes too great for either your server or network connection or both. Multiple options are available for load balancing.
  • DNS round-robin: Discussed above, this uses DNS to point users to random server in a list of appropriate servers. This spreads the load among the servers in the list.
  • Use a Linux Virtual Server to Create a Load Balance Cluster. See next section below.
  • Run a reverse proxy. See nginx ("engine X"). From a single external internet network connection, route http, smtp, imap or pop3 traffic to various servers on an internal network. Results are pushed back to the nginx proxy for routing to the internet (no caching).
  • Run the Apache httpd web server module "mod_proxy" to offload processing of dynamic content to another web server. This acts as a reverse proxy, routing external traffic to various servers on an internal network.

負載平衡器優缺點比較(Nginx vs HAProxy vs LVS)

負載平衡器優缺點比較(Nginx vs HAProxy vs LVS)
在這3款負載平衡伺服器中,丫忠比較有接觸的是Nginx,直到最近才對HAProxy比較有接觸,LVS則是完全沒印象,不過既然找到了負載平衡伺服器的比較資料,丫忠就順便做個筆本,再找個時候來研究看看,甚至寫個安裝心得筆記囉!

Nginx的優點

1.效能不錯,同時負載效能可以達到1萬

2.功能較齊全,除了當負載平衡伺服器外,還可以像apache一樣當Web伺服器,且可以透過Geo模式(註1)來達到流量分配功能。

3.支援的模組比較多

4.支援Gzip proxy.

Nginx的缺點

1.不支援session keep alive


2.對於檢查後端伺服器狀態的支援度不夠好。只支援透過埠號(port)來檢查,無法透過url檢查

3.對big request header的支援不是很好,如果 client_header_buffer_size 的設定值比較小,就會返回400 bad request的頁面
也許您對 Nginx V1.5 中文技術手冊 有興趣。

HAProxy的優點

1.支援session keep alive

2.透過指定的url檢查後端伺服器的狀態


3.支援tcp協定的負載平衡,譬如:可以給mysql伺服器mail server郵件伺服器負載平衡

4.支援虛擬主機

HAProxy的缺點

1.目前沒有支援 nagios(註2) 和 cacti (註3)的網路監控功能

LVS的優點

1.效能好,接近硬體式負載平衡設備的效能和連接負載效能

2.LVS的DR模式支援透過廣域網路進行負載平衡,這是個相當大的功能特性,因為其他2款負載平衡器不具備此功能

LVS的缺點

1.比較複雜,模組支援度不如 Nginx
註1:所謂Geo模式是指全域的負載模式,根據不同客戶端(Client)的ip分配到不同的伺服器(Server)。譬如:將特定客戶端的IP分配到特定的伺服器,一般網路使用者則分配到一般Web伺服器

註2:Nagios 為提高效能和準確性的網路監控軟體(詳細)
註3:Cacti 是一套類似 MRTG 的snmp流量監控跟系統資訊監控軟體,除此 Cacti 還可以外掛 Script 及 Templates作出各種的監控圖 ,有興趣的網友可以參考 CaCti官網 的說明

幸運的是有一套軟體 cacti 的發展就是基於讓 RRDTool 使用者更方便使用該軟體,

除了基本的 Snmp 流量跟系統資訊監控外,Cacti 也可外掛 Scripts 及加上 Templates 來作出各式各樣的監控圖

Reference: http://homeserver.com.tw/proxy-server/%E8%B2%A0%E8%BC%89%E5%B9%B3%E8%A1%A1%E5%99%A8%E5%84%AA%E7%BC%BA%E9%BB%9E%E6%AF%94%E8%BC%83nginx-vs-haproxy-vs-lvs/