Tuesday, February 23, 2010

Difference Between getcwd() and dirname(__FILE__) ? Which should I use?

Difference Between getcwd() and dirname(__FILE__) ? Which should I use?

__FILE__ is a magic constant containing the full path to the file you are executing. If you are inside an include, its path will be the contents of __FILE__.

So with this setup:

./foo.php
<?php
echo getcwd() ,"\n";
echo dirname(__FILE__),"\n" ;
echo '-------',"\n";
include 'bar/bar.php';
?>

./bar/bar.php
<?php
echo getcwd(),"\n";
echo dirname(__FILE__),"\n";
?>

You get this output (Windows / Zend Studio Debugger):

C:\Users\me\Zend\workspaces\DefaultWorkspace7\random
C:\Users\me\Zend\workspaces\DefaultWorkspace7\random
-------
C:\Users\me\Zend\workspaces\DefaultWorkspace7\random
C:\Users\me\Zend\workspaces\DefaultWorkspace7\random\bar

On FreeBSD:

# php foo.php
/tmp/tmp
/tmp/tmp
-------
/tmp/tmp
/tmp/tmp/bar

Appearantly, getcwd() returns the CURRENT directory where the file you started executing resided, while dirname(__FILE__) is file-dependant.

Note: for getcwd(), if you put your script in /tmp/test.php, but you started executing that script in / the root directory, getcwd() will return / instead of /tmp

Example:
<?php
echo getcwd();
?>

Running the script from /tmp, returns /tmp
root@host [/tmp] # php test.php
/tmp

While, running the script from /, returns /
root@host [/] # php /tmp/test.php
/

So, please be aware of the difference.

InnoDB takes over an hour to import 600MB file, MyISAM in a few minutes

I'm currently working on creating an environment to test performance of an app; I'm testing with MySQL and InnoDB to find out which can serve us best. Within this environment, we'll automatically prepare the database (load existing dumps) and instrument our test tools.

I'm preparing to test the same data dump with MySQL and InnoDB, but I'm already failing to bring the initial import to an usable speed for the InnoDB part. The initial dump took longer, but that didn't concerned me yet:

$ for i in testdb_myisam testdb_innodb; do time mysqldump --extended-insert $i > $i.sql; done

real 0m38.152s
user 0m8.381s
sys 0m2.612s

real 1m16.665s
user 0m6.600s
sys 0m2.552s
However, the import times were quite different:

$ for i in testdb_myisam testdb_innodb; do time mysql $i < $i.sql; done

real 2m52.821s
user 0m10.505s
sys 0m1.252s

real 87m36.586s
user 0m10.637s
sys 0m1.208s
After research I came over http://stackoverflow.com/questions/457060/changing-tables-from-myisam-to-innodb-make-the-system-slow and then used set global innodb_flush_log_at_trx_commit=2:

$ time mysql testdb_innodb < testdb_innodb.sql

real 64m8.348s
user 0m10.533s
sys 0m1.152s

IMHO still shockingly slow. I've also disabled log_bin for these tests and here's a list of all mysql variables.

Do I've to accept this long InnoDB times or can they be improved? I've full control over this MySQL server as it's purely for this test environment.

I can apply special configurations only for initial import and change them back for applications tests so they better match production environments.

Update:

Given the feedback, I've disabled autocommit and the various checks:

$ time ( echo "SET autocommit=0; SET unique_checks=0; SET foreign_key_checks=0;" \
; cat testdb_innodb.sql ; echo "COMMIT;" ) | mysql testdb_innodb;date

real 47m59.019s
user 0m10.665s
sys 0m2.896s

The speed improved, but not that much. Is my test flawed?

Update 2:

I was able to gain access to a different machine were imports only took about 8 minutes. I compared the configurations and applied the following settings to my MySQL installation:

innodb_additional_mem_pool_size = 20971520
innodb_buffer_pool_size = 536870912
innodb_file_per_table
innodb_log_buffer_size = 8388608
join_buffer_size = 67104768
max_allowed_packet = 5241856
max_binlog_size = 1073741824
max_heap_table_size = 41943040
query_cache_limit = 10485760
query_cache_size = 157286400
read_buffer_size = 20967424
sort_buffer_size = 67108856
table_cache = 256
thread_cache_size = 128
thread_stack = 327680
tmp_table_size = 41943040

With these settings I'm now down to about 25 minutes. Still far away from the few minutes MyISAM takes, but it's getting more usable for me.

================================

Did you try the Bulk Data Loading Tips from the InnoDB Performance Tuning Tips (especially the first one):

When importing data into InnoDB, make sure that MySQL does not have autocommit mode enabled because that requires a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:

SET autocommit=0;
... SQL import statements ...
COMMIT;

If you use the mysqldump option --opt, you get dump files that are fast to import into an InnoDB table, even without wrapping them with the SET autocommit and COMMIT statements.

If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:

SET unique_checks=0;
... SQL import statements ...
SET unique_checks=1;

For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys.

If you have FOREIGN KEY constraints in your tables, you can speed up table imports by turning the foreign key checks off for the duration of the import session:

SET foreign_key_checks=0;
... SQL import statements ...
SET foreign_key_checks=1;

For big tables, this can save a lot of disk I/O.

IMO, the whole chapter is worth the read.

Backing up And Restoring MySQL Innodb Database

Backing up And Restoring MySQL Innodb Database

MySQL backups are performed using the common mysqldump tool. This is a command line utility that ships with MySQL and you use at as follows:
% mysqldump --user=user --password=pass --opt DBNAME > dumpfile.sql

You may also need to specify the --host= parameter to force the hostname you are connecting to. This depends largely on how you've setup your user security. This will produce a text file with a series of INSERT/DROP/CREATE SQL statements that will recreate the database.
The --opt flag is very important. This is shorthand to pass in many flags at once; --add-drop-table --add-locks --create-options --disable-keys --extended-insert --lock-tables --quick --set-charset. This ensures that your database is in a good state while the backup is performed, including restricting all write access while the backup is in operation. Any locks placed will be automatically removed when this utility finishes.


Restoring a backup
Restoring a backup, or importing from an existing dump file can take a long time depending on the number of indexes and primary keys you have on each table. You can speed this process up dramatically by modifying your original dump file by surrounding it with the following:

SET AUTOCOMMIT = 0;

SET FOREIGN_KEY_CHECKS=0;

.. your dump file ..

SET FOREIGN_KEY_CHECKS = 1;

COMMIT;

SET AUTOCOMMIT = 1;

This turns off all the checks and auto-commits. This is a safe operation to do if you are fully restoring a database since the previous dump has already been validated for legal keys. If however you are importing, or adding to an existing database, then this step is not advisable.
You can then easily import the SQL file into MySQL using:

% mysql --user=user --password=pass DBNAME < dumpfile.sql

or using

mysql> USE db_name

mysql> SET AUTOCOMMIT = 0;

mysql> SET UNIQUE_CHECKS = 0;

mysql> SET FOREIGN_KEY_CHECKS = 0;

mysql> SOURCE dump_file_name

mysql> SET UNIQUE_CHECKS = 1;

mysql> SET FOREIGN_KEY_CHECKS = 1;

mysql> COMMIT;

mysql> SET AUTOCOMMIT = 1;

Note: after done some testings, specifying the autocommit = 0 command will not work, if your sql dump file contains LOCK TABLES WRITE statement. Because the LOCK TABLES WRITE statement seems automatically does "COMMIT;".

According to mysql manual says: Locks may be used to emulate transactions or to get more speed when updating tables.

http://dev.mysql.com/doc/refman/4.1/en/lock-tables.html

Note: how I do testing is I tried to import and export sql dump file, opened up the dump file, and observe the binary log file.

Monday, February 22, 2010

What does "size" in int(size) of MySQL mean?

What does "size" in int(size) of MySQL mean?
Friday, August 24th, 2007 at 21:40 +0000 (UTC) by Alexander Kirk 
I was always wondering what the size of numeric columns in MySQL was. Forgive me if this is obvious to someone else. But for me the MySQL manual lacks a great deal in this field.

Usually you see something like int(11) in CREATE TABLE statements, but you can also change it to int(4).

So what does this size mean? Can you store higher values in a int(11) than in an int(4)?

Let's see what the MySQL manual says:

INT[(M)] [UNSIGNED] [ZEROFILL]
A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.

No word about the M. The entry about BOOL suggests that the size is not there for fun as it is a synonym for TINYINT(1) (with the specific size of 1).

TINYINT[(M)] [UNSIGNED] [ZEROFILL]
A very small integer. The signed range is -128 to 127. The unsigned range is 0 to 255.

BOOL, BOOLEAN
These types are synonyms for TINYINT(1). A value of zero is considered false. Non-zero values are considered true: [...]

So TINYINT(1) must be different in some way from TINYINT(4) which is assumed by default when you leave the size out1. Still, you can store for example 100 into a TINYINT(1).

Finally, let's come to the place of the manual where there is the biggest hint to what the number means:

Several of the data type descriptions use these conventions:

M indicates the maximum display width for integer types. For floating-point and fixed-point types, M is the total number of digits that can be stored. For string types, M is the maximum length. The maximum allowable value of M depends on the data type.

It's about the display width. The weird thing is, though2, that, for example, if you have a value of 5 digits in a field with a display width of 4 digits, the display width will not cut a digits off.

If the value has less digits than the display width, nothing happens either. So it seems like the display doesn't have any effect in real life.

Now2 ZEROFILL comes into play. It is a neat feature that pads values that are (here it comes) less than the specified display width with zeros, so that you will always receive a value of the specified length. This is for example useful for invoice ids.

So, concluding: The size is neither bits nor bytes. It's just the display width, that is used when the field has ZEROFILL specified.

If you see any more uses in the size value, please tell me. I am curious to know.

1 See this example:
mysql> create table a ( a tinyint );
Query OK, 0 rows affected (0.29 sec)
mysql> show columns from a;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| a     | tinyint(4) | YES  |     | NULL    |       |
+-------+------------+------+-----+---------+-------+
1 row in set (0.26 sec)

mysql> alter table a change a a tinyint(1);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into a values (100);
Query OK, 1 row affected (0.00 sec)

mysql> select * from a;
+-----+
| a   |
+-----+
| 100 |
+-----+
1 row in set (0.00 sec)


2 Some code to better explain what I described so clumsily.
mysql> create table b ( b int (4));
Query OK, 0 rows affected (0.25 sec)

mysql> insert into b values (10000);
Query OK, 1 row affected (0.00 sec)

mysql> select * from b;
+-------+
| b     |
+-------+
| 10000 |
+-------+
1 row in set (0.00 sec)

mysql> alter table b change b b int(11);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from b;
+-------+
| b     |
+-------+
| 10000 |
+-------+
1 row in set (0.00 sec)

mysql> alter table b change b b int(11) zerofill;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from b;
+-------------+
| b           |
+-------------+
| 00000010000 |
+-------------+
1 row in set (0.00 sec)

mysql> alter table b change b b int(4) zerofill;
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from b;
+-------+
| b     |
+-------+
| 10000 |
+-------+
1 row in set (0.00 sec)

mysql> alter table b change b b int(6) zerofill;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from b;
+--------+
| b      |
+--------+
| 010000 |
+--------+
1 row in set (0.00 sec)

Thursday, February 18, 2010

Diagnosing and Fixing Memory Issues

[Mon Feb 08 17:10:52 2010] [notice] child pid 39479 exit signal Illegal instruction (4)
[Mon Feb 08 17:10:53 2010] [notice] child pid 39510 exit signal Illegal instruction (4)
[Sun Feb 14 05:43:24 2010] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Sun Feb 14 12:38:28 2010] [notice] child pid 93038 exit signal Segmentation fault (11)
[Sun Feb 14 12:40:27 2010] [notice] child pid 92895 exit signal Segmentation fault (11)
[Sun Feb 14 12:50:16 2010] [notice] child pid 92968 exit signal Segmentation fault (11)
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(8): failed
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(16): failed
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(12): failed
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(16): failed
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(12): failed
Feb 14 20:00:42 web020 kernel: swap_pager_getswapspace(16): failed
===========================
Diagnosing and Fixing Memory Issues

When your VPS is running low on physical memory, it may start to "swap thrash." This means it's attempting to use your swap partition heavily instead of real RAM. We recommend you limit your swap partition size to 256 MB; heavy use of swap in a virtualized environment will cause major performance problems.

We strongly recommend running the 32-bit version of your distribution of choice, unless you have a specific reason to run the 64-bit version. The 32-bit version of your distro will use significantly less memory.
===========================
5. Separate Out Virtual Swap Files Onto Separate Virtual Disks

Another way to increase performance on virtual machines is separate the swap files onto a separate host virtual and physical disks. In other words, currently, your swap file goes to the same virtual disk as all your other disk files. To increase performance, you would create a new virtual disk on a different host physical disk. You would then relocate your swap file to that new virtual disk. It won’t really be beneficial if you don’t relocate the new virtual disk to a new physical disk. This is a more advanced topic and complex process so I won’t go into all the details in this article. However, to change the location of your swap file, you go to Control Panel -> System -> Advanced Tab -> Performance Settings -> Advanced Tab -> Virtual Memory Change. Of course, you hope that you don’t have to do much swapping. To prevent swapping, you should increase the size of the memory allocated to the virtual machine (within the size of your physical RAM). Note, see “Upgrade your RAM”, below.
=======================

Reference:
http://www.petri.co.il/virtual_increase_vmware_performance.htm
http://library.linode.com/troubleshooting/memory-networking

Wednesday, February 17, 2010

connect to mysql

C:\perl\bin\ppm.bat


Install:
- DBI
- DBD-MYSQL

Check:
perl -MDBD::mysql

Tuesday, February 16, 2010

SyncToy - 同步備份軟體

試了滿多同步備份軟體, 最後發現微軟的 SyncToy 簡單又好用, 而且速度很快.
支援三種模式
Synchronize(雙向同步)
兩邊有任何異動(新增、刪除、更名、修改)都會同步到另一邊.
Echo(單向同步)
左邊有任何異動(新增、刪除、更名、修改)都會同步到另一邊,
右邊的異動不會同步到左邊.
Contribute(單向同步不刪除)
和 Echo 一樣, 但是右邊不刪除
缺點是沒有支援 FTP

其他試過的備份軟體
Cobian, 免費軟體
優點: 有中文介面, 支援 UTF-8 FTP
缺點: 單工作業, 速度慢, 第一次備份時不會先檢查來源、目的目錄的檔案清單, 明明目的目錄都已經有檔案了還是笨笨的做完整備份.
SyncBack, SyncBack 為免費軟體, SE 及 Pro 版需付費
優點: 支援 FTP
缺點: 不支援 UTF-8 FTP, 功能完整的 Pro 版需付費
Second Copy, 付費軟體
優點: 功能強大, 支援 FTP, 多工處理, 速度快
缺點: 不支援 UTF-8 FTP
AceBackup, 免費軟體
優點: 支援 FTP
缺點: 介面不是很容易上手, 不支援 UTF-8 FTP

Monday, February 15, 2010

Programmatic CCK content type creation

Programmatic CCK content type creation
Posted April 23, 2009 - 9:25pm by chris
Do you want your Drupal module to create a new content type using the power and flexibility of CCK, perhaps including the built in integration with Views? Here's one way, using Drupal 6.

Step 1: Design your CCK content type using the Drupal content type editor

The menu path is: Administer -> Content management -> Content types -> Add content type, URL http://example.com/admin/content/types/add. For more information about getting started with building a content type with CCK, visit this Drupal handbook page.

Step 2: Export your new CCK content type

After you save your new content type in step 1 above, then navigate to menu path Administer -> Content management -> Content types -> Export, URL http://example.com/admin/content/types/export.

Select the content type you just created from the radio buttons displayed, and then click the Export button.

The next screen displays the fields for the selected content type.

Take note of the Types of fields you used. If there are any which are not included with the basic CCK module (content.module), you need to make sure to include them in the dependencies for your module below, or it will be unable to create the content type.

In general, you will want to use the default of exporting all of the fields. Make sure they are all selected via the checkboxes, then click the Export button.

The next page produces a large textarea with the necessary PHP code to define your CCK content type. When ready to use this code, select all of the text in the textarea and copy it to your copy buffer to paste into your definition function, described below.

Step 3: create a PHP function to hold your CCK definition

The template of the function will look like this, prior to pasting the contents of your copy buffer from above (containing the exported CCK content type definition code). We use a separate function just for the definition because it is must be updated with a new export from CCK if we decide to change or tweak our CCK content type later. It makes editing easier and less error prone.

function _modulename_cck_export() {
// paste code after this line.

// paste code before this line.
return $content;
}
Step 4: Carefully paste the exported CCK content type code into the function

The end result should look like this:

function _modulename_cck_export() {
// paste code after this line.
$content[type] = array (
'name' => 'computed moodle',
'type' => 'computemoodle',
'description' => 'Example CCK code.',
'title_label' => 'Title',
'body_label' => 'Body',
'min_word_count' => '0',
'help' => '',
'node_options' =>
array (
'status' => true,
'promote' => true,
'sticky' => false,
'revision' => false,
),
'old_type' => '',
'orig_type' => '',
'module' => 'node',
'custom' => '1',
'modified' => '1',
'locked' => '0',
'content_profile' => false,
'comment' => '2',
'comment_default_mode' => '4',
'comment_default_order' => '1',
'comment_default_per_page' => '50',
'comment_controls' => '3',
'comment_anonymous' => 0,
'comment_subject_field' => '1',
'comment_preview' => '1',
'comment_form_location' => '0',
);
$content[fields] = array (
0 =>
array (
'label' => 'Example field',
'field_name' => 'field_example',
'type' => 'number_integer',
'widget_type' => 'number',
'change' => 'Change basic information',
'weight' => '-1',
'description' => '',
'default_value' =>
array (
0 =>
array (
'value' => '',
'_error_element' => 'default_value_widget][field_example][0][value',
),
),
'default_value_php' => '',
'default_value_widget' => NULL,
'group' => false,
'required' => 0,
'multiple' => '0',
'min' => '',
'max' => '',
'prefix' => '',
'suffix' => '',
'allowed_values' => '',
'allowed_values_php' => '',
'op' => 'Save field settings',
'module' => 'number',
'widget_module' => 'number',
'columns' =>
array (
'value' =>
array (
'type' => 'int',
'not null' => false,
'sortable' => true,
),
),
'display_settings' =>
array (
'label' =>
array (
'format' => 'above',
'exclude' => 0,
),
'teaser' =>
array (
'format' => 'default',
'exclude' => 0,
),
'full' =>
array (
'format' => 'default',
'exclude' => 0,
),
4 =>
array (
'format' => 'default',
'exclude' => 0,
),
),
),
);
$content[extra] = array (
'title' => '-5',
'body_field' => '-3',
'menu' => '-2',
);
// paste code before this line.
return $content;
}
Step 5: Optionally clean up the pasted code

You may want to edit any string array indexes to be enclosed in single quote marks to avoid PHP warnings. I'm not sure why CCK exports faulty code like this. For example, as exported, the last array element is given as

$content[extra] = array (but really should be expressed as

$content['extra'] = array (Step 6: Write PHP code to actually create the content type

Write the install code to initiate creation of the new content type. This is where we make sure we have the necessary CCK pieces, populate a form using the now hard-coded CCK export, and then use drupal_execute() to submit the form to create the new content type. Here's an example:

function _example_install_cck_node() {
/* get the CCK node types to be created. This is where you load the
* file containing your function from above, if necessary, and then call
* that function.
*/
module_load_include('inc', 'modulename', 'modulename.ccknodedef');
$content = _modulename_cck_export(); // in modulename.ccknodedef.inc

// CCK content_copy.module may not be enabled, so make sure it is included
require_once './' . drupal_get_path('module', 'content')
. '/modules/content_copy/content_copy.module';

$form_state['values']['type_name'] = '';
$form_state['values']['macro'] = '$content = ' . var_export($content, TRUE) . ';';

// form provided by content_copy.module
drupal_execute('content_copy_import_form', $form_state);
content_clear_type_cache();
}
Step 7: Call the create function from your module

Figure out where you want to call the above install function from. I created an administrator button for creating it, because hook_install() uses the Batch API which is incompatible with the Forms API used to create the content type. Thus, it's not possible to have the type easily created at install time when the module is enabled. See the bug at issue http://drupal.org/node/297972. This bug is actually now fixed in Drupal CVS, but is not yet in a release as of Drupal 6.10.

Step 8: Edit your *.info file

Be sure to add the appropriate dependencies[] clauses in your modulename.info file if your CCK content type is not optional for your module. At a minimum, you will need to specify the CCK module, which is named content.

All done.

That's it. Now your module can create a CCK content type it can use, and which will benefit from all of the other functionality such as Views which works so well with CCK.

Let me know if you have questions, corrections or improvements.

Thanks to Angie Byron (webchick) for pointing me at some existing code which helped me finish figuring out how to do this cleanly.

Tuesday, February 2, 2010

Why MySQL’s binlog-do-db option is dangerous

Why MySQL’s binlog-do-db option is dangerous
Posted by Baron Schwartz | Vote on Planet MySQL
I see a lot of people filtering replication with binlog-do-db, binlog-ignore-db, replicate-do-db, and replicate-ignore-db. Although there are uses for these, they are dangerous and in my opinion, they are overused. For many cases, there's a safer alternative.


The danger is simple: they don't work the way you think they do. Consider the following scenario: you set binlog-ignore-db to "garbage" so data in the garbage database (which doesn't exist on the slave) isn't replicated. (I'll come back to this in a second, so if you already see the problem, don't rush to the comment form.)

Now you do the following:

PLAIN TEXTCODE:
$ mysql
mysql> delete from garbage.junk;
mysql> use garbage;
mysql> update production.users set disabled = 1 where user = "root";

You just broke replication, twice. Once, because your slave is going to execute the first query and there's no such table "garbage.junk" on the slave. The second time, silently, because the update to production.users isn't replicated, so now the root user isn't disabled on the slave.

Why? Because binlog-ignore-db doesn't do what you think. The phrase I used earlier, "data in the garbage database isn't replicated," is a fallacy. That's not what it does. In fact, it filters out binary logging for statements issued from connections whose default database is "garbage." In other words, filtering is not based on the contents of the query -- it is based on what database you USE.

The other configuration options I mentioned work similarly. The binlog-do-db and binlog-ignore-db statements are particularly dangerous because they keep statements from ever being written to the binary log, which means you can't use the binary log for point-in-time recovery of your data from a backup.

In a carefully controlled environment, these options can have benefits, but I won't talk about that here. (We covered that in our book.)

The safer alternative is to configure filters on the slave, with options that actually operate on the tables mentioned in the query itself. These are replicate-wild-* options. For example, the safer way to avoid replicating data in the garbage database is to configure replicate-wild-ignore-table=garbage.%. There are still edge cases where that won't work, but it works in more cases and has fewer gotchas.

If you are confused, you should read the replication rules section of the manual until you know it by heart

Posted by Baron Schwartz @ 6:01 am :: replication, tips
Print This Post del.icio.us :: digg
Comment RSS
Related posts: :Dangerous command::Beware: ext3 and sync-binlog do not play well together::Global Transaction ID and other patches available!:

10 Comments »
1. John Swindells
This is good to know. Is it true, therefore, that binlog-do-db behaves itself if you have always selected your database beforehand? When you say ‘default database’, does that include a database selected by USE DATABASE?

Comment :: May 14, 2009 @ 7:07 am

2. Sheeri K. Cabral

The other important issue to consider is that binary logs are not only used for replication. They are incremental backups, and if you chose to use the “do” statements, you are effectively erasing history, and should you have a disaster and need the incremental backups, there is no way to retrieve the ignored information.

Comment :: May 14, 2009 @ 7:24 am

3. Baron Schwartz

John, yes that’s true.

Comment :: May 14, 2009 @ 7:26 am

4. peter

Sheeri,

Indeed – so any binlog filtering is evil if you care about your data.
With row level replication it is probably safe to skip tables which you do not care about such as temporary tables from the logging but this is about it.

Comment :: May 14, 2009 @ 8:58 am

5. Robert Hodges

Master side filtering has a host of pitfalls, but even so it is sometimes necessary. Applications can turn off the binlog for selected statements using SET SQL_LOG_BIN=0. That’s better because at least you presumably know exactly what you are doing at the application level. We use this feature for Tungsten Replicator catalogs–replicating them would break our application. We have very flexible filters both on the master as well as the slave but our experience has been exactly what Baron found, namely that filtering on the slave is best.

Comment :: May 14, 2009 @ 5:40 pm

6. Morgan Christiansson
Which is why this behaviour is very clearly documented in the MySQL manual for this setting.

http://dev.mysql.com/doc/refman/5.0/en/replication-options-binary-log.html

Comment :: May 14, 2009 @ 7:04 pm

7. Baron Schwartz

Morgan, I have slowly come to realize that most people never read the manual. They read wikihow.com or some other garbage and think they know things. Sad but true.

Comment :: May 14, 2009 @ 8:31 pm

8. Shantanu Oak

>> Because binlog-ignore-db doesn’t do what you think.
I wish it did what it says. It would have brought down the network traffic when I have to update the slaves located in other cities.

Comment :: May 14, 2009 @ 9:56 pm

9. Morgan Christiansson
Shantanu, there is also –replicate-do-table and –replicate-wild-do-table which have different behaviour than –binlog-do-db

See
http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_replicate-do-table
http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_replicate-wild-do-table

The documentation even says “This works for cross-database updates, in contrast to –replicate-ignore-db.”

But you still need to be aware of it’s quirks, queries that JOIN the wrong tables in updates could leave your replication out of sync.

Comment :: May 14, 2009 @ 10:35 pm

10. Simon Mudd

Row based replication in 5.1 makes this sort of thing clearer and that’s almost certainly one of the reasons it was added. For certain SQL statements it can also be much quicker.

The whole replicate-wild* or replicate*ignore options would be so much better handled if they were done differently, for example as done by Sybase were you can easily configure replication on a per table basis.

Even having a few mysql.XXXX tables which define the replication rules would be better than using the my.cnf only options. I’ve only looked in detail at Sybase replication, so am not sure what is offered by Oracle or other commercial RDBMS vendors. MySQL’s replication facilities for simple stuff is great, but the moment you want to do things in a slightly more complex fashion opens the way for potential headaches if you are not really aware of how replication works in MySQL.

Which is why I stand by the claim in my blog posting a while back that it would be so much better if replication were pulled out of mysqld and moved to a separate process dedicated to the task.

Then as mentioned the binlog could be used for what it’s supposed to be: point in time recovery and the replication process could be improved without having to worry so much about what goes on in the database.