ganeti-github.git
9 years agoFix bug related to log opening failures
Iustin Pop [Thu, 10 Mar 2011 11:19:17 +0000 (12:19 +0100)]
Fix bug related to log opening failures

If opening the log file fails, then we shouldn't attempt to use that
variable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoBump version for 2.4.1 release v2.4.1
Iustin Pop [Wed, 9 Mar 2011 12:05:16 +0000 (13:05 +0100)]
Bump version for 2.4.1 release

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agocfgupgrade: Fix critical bug overwriting RAPI users file
Michael Hanselmann [Tue, 8 Mar 2011 16:20:07 +0000 (17:20 +0100)]
cfgupgrade: Fix critical bug overwriting RAPI users file

The cfgupgrade tool was designed to be idempotent, that means it could
be run several times and still give produce the correct result. Ganeti
2.4 moved the file containing the RAPI users to a separate directory
(…/lib/ganeti/rapi/users). If it exists, cfgupgrade would automatically
move an existing file from …/lib/ganeti/rapi_users and replace it with a
symlink.

Unfortunately one of the checks for this was incorrect and, when run
multiple times, replaces the users file at the new location with a
symlink created during a previous run.

In addition the “--dry-run” parameter to cfgupgrade was not respected.
Unittests are updated for all these cases.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoRelease 2.4.0 v2.4.0
Iustin Pop [Mon, 7 Mar 2011 11:00:51 +0000 (12:00 +0100)]
Release 2.4.0

NEWS update and version bump.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoMerge branch 'devel-2.3' into devel-2.4
Iustin Pop [Mon, 7 Mar 2011 09:50:27 +0000 (10:50 +0100)]
Merge branch 'devel-2.3' into devel-2.4

* devel-2.3:
  Fix LUClusterRepairDiskSizes and rpc result usage
  Fix RPC mismatch in blockdev_getsize[s]
  RAPI: fix evacuate node resource

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoSmall improvement to the ganeti man page
Iustin Pop [Thu, 3 Mar 2011 10:16:39 +0000 (11:16 +0100)]
Small improvement to the ganeti man page

Also specifies the comma-escaping feature.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoMerge branch 'devel-2.2' into devel-2.3 devel-2.3 github/devel-2.3
Iustin Pop [Fri, 4 Mar 2011 11:36:15 +0000 (12:36 +0100)]
Merge branch 'devel-2.2' into devel-2.3

* devel-2.2:
  Fix LUClusterRepairDiskSizes and rpc result usage
  Fix RPC mismatch in blockdev_getsize[s]

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoFix LUClusterRepairDiskSizes and rpc result usage devel-2.2 github/devel-2.2
Iustin Pop [Tue, 15 Feb 2011 13:39:44 +0000 (14:39 +0100)]
Fix LUClusterRepairDiskSizes and rpc result usage

This LU was introduced before the RPC result conversion from .data to
.payload, and it has managed to keep the old-style usage (how? it's
the only LU that does so). Fix by changing to payload, and add some
extra logging for easier diagnose.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit 043beb38f4e10b75d0820c361c668c441c7a6980)

9 years agoFix RPC mismatch in blockdev_getsize[s]
Iustin Pop [Tue, 15 Feb 2011 13:29:08 +0000 (14:29 +0100)]
Fix RPC mismatch in blockdev_getsize[s]

Commit 92fd2250 added consistency checks in the RPC layer, which broke
the call_blockdev_getsizes RPC call (declared with 's' at the end in
rpc.py, without 's' in the node daemon).

The immediate fix is to correct the rpc function name, the long term
one will be to remove this duplication.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>
(cherry picked from commit ccfbbd2d1546b4f57d5bfeb115573967f7fb558b)

9 years agoRAPI: fix evacuate node resource
Iustin Pop [Fri, 4 Mar 2011 10:04:10 +0000 (11:04 +0100)]
RAPI: fix evacuate node resource

PollJob returns the whole op_results, hence a list of opcode results.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoMerge remote branch 'stable-2.4' into devel-2.4
Guido Trotter [Wed, 2 Mar 2011 21:36:01 +0000 (13:36 -0800)]
Merge remote branch 'stable-2.4' into devel-2.4

* origin/stable-2.4:
  Fix typo in kvm-ifup script
  NEWS: Replace smartquotes, start lines with uppercase
  Update NEWS and release 2.4.0 rc3
  Fix potential data-loss bug in disk wipe routines

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix typo in kvm-ifup script
Michael Hanselmann [Tue, 1 Mar 2011 17:32:40 +0000 (18:32 +0100)]
Fix typo in kvm-ifup script

Reported-by: Bas Tichelaar <bas@30loops.net>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9 years agoNEWS: Replace smartquotes, start lines with uppercase
Michael Hanselmann [Mon, 28 Feb 2011 15:26:00 +0000 (16:26 +0100)]
NEWS: Replace smartquotes, start lines with uppercase

- Sphinx converts ASCII quotes ("") to smartquotes (“”) automatically
- Sentences or list items start with an uppercase letter
- Changed description of non-verbose “gnt-* list” output slightly

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix LU processor's GetECId
Michael Hanselmann [Mon, 28 Feb 2011 17:01:43 +0000 (18:01 +0100)]
Fix LU processor's GetECId

The exception was never actually raised.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Adeodato Simo <dato@google.com>

9 years agoUpdate NEWS and release 2.4.0 rc3 v2.4.0rc3
Iustin Pop [Mon, 28 Feb 2011 14:12:14 +0000 (15:12 +0100)]
Update NEWS and release 2.4.0 rc3

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoMerge branch 'devel-2.4' into stable-2.4
Iustin Pop [Mon, 28 Feb 2011 13:30:45 +0000 (14:30 +0100)]
Merge branch 'devel-2.4' into stable-2.4

* devel-2.4:
  1-char comment typo fix
  Expand some acronyms, add to glossary
  query_unittest: Fix argument to set()

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoFix potential data-loss bug in disk wipe routines
Iustin Pop [Mon, 28 Feb 2011 10:06:14 +0000 (11:06 +0100)]
Fix potential data-loss bug in disk wipe routines

For the 2.4 release, we only add the missing RPC calls. However, this
needs to be fixed properly, by preventing usage of mis-configured
disks.

Also add a bit more logging so that it's directly clear on which node
the wipe is being done.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years ago1-char comment typo fix
Stephen Shirley [Fri, 25 Feb 2011 15:02:14 +0000 (16:02 +0100)]
1-char comment typo fix

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoExpand some acronyms, add to glossary
Stephen Shirley [Thu, 24 Feb 2011 15:19:07 +0000 (16:19 +0100)]
Expand some acronyms, add to glossary

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoquery_unittest: Fix argument to set()
René Nussbaumer [Wed, 23 Feb 2011 13:16:12 +0000 (14:16 +0100)]
query_unittest: Fix argument to set()

Commit e431074f introduced an uncatched bug. This patch fixes this. The
set is expecting a list or iteratable to work on, so it splitted the
provided instance name into a set of characters. This caused the
exp_status never been set and therefore not catched in one assert rule
further below who checks that every status was tested.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix title of query field containing instance name
Michael Hanselmann [Tue, 22 Feb 2011 17:17:57 +0000 (18:17 +0100)]
Fix title of query field containing instance name

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoUpdate news and bump version for 2.4.0 rc2 v2.4.0rc2
Iustin Pop [Mon, 21 Feb 2011 10:28:00 +0000 (11:28 +0100)]
Update news and bump version for 2.4.0 rc2

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoMerge branch 'devel-2.4' into stable-2.4
Iustin Pop [Mon, 21 Feb 2011 09:36:10 +0000 (10:36 +0100)]
Merge branch 'devel-2.4' into stable-2.4

* devel-2.4: (23 commits)
  Fix pylint warnings
  Change the list formatting to a 'special' chars
  Add support for merging node groups
  Add option to rename groups on conflict
  Fix minor docstring typo
  Fix HV/OS parameter validation on non-vm nodes
  NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
  NodeQuery: don't query non-vm_capable nodes
  Remove superfluous redundant requirement
  Don't remove master_candidate flag from merged nodes
  Use a consistent ECID base
  listrunner: convert from getopt to optparse
  listrunner: fix agent usage
  Revert "Disable the cluster-merge tool for the moment"
  Fix cluster-merging by not stopping noded
  Fix error msg for instances on offline nodes
  Minor reordering to match param order
  cluster verify and instance disks on offline nodes
  Cluster verify and N+1 warnings for offline nodes
  Handle gnt-instance shutdown --all for empty clusters
  Use gnt-node add --force-join to add foreign nodes
  Add --force-join option to gnt-node add
  Fix iterating over node groups

Of the above commits present in the devel-2.4 branch, only the “Add
--force-join option to gnt-node add” is a potential issue, but this
has been QA-ed successfully. The other fixes are split in three
groups:

- non-core changes (cluster-merge, listrunner)
- trivial fixes (docstrings, etc.)
- bugs that we want fixed

As such, instead of cherry-picking only individual patches, I propose
that we unify stable and devel 2.4 and make a new RC out of the
result.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix pylint warnings
Stephen Shirley [Fri, 18 Feb 2011 15:25:59 +0000 (16:25 +0100)]
Fix pylint warnings

- 1 80-char line infraction
- 4 changes in how arguments are passed to logging functions
- 3 pylint disable-msg's because cluster-merge needs to access ganeti
  config internals

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9 years agoTestRapiInstanceRename use instance name
Guido Trotter [Fri, 18 Feb 2011 12:52:58 +0000 (12:52 +0000)]
TestRapiInstanceRename use instance name

Currently the QA rename job wrongly passed the whole info dict to the
client.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoChange the list formatting to a 'special' chars
Iustin Pop [Fri, 18 Feb 2011 12:51:03 +0000 (13:51 +0100)]
Change the list formatting to a 'special' chars

And also enable verbose display via the, well, verbose option. Man
page and tests are updated, and the formatting is moved from 4 if
statements to a data structure.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoAdd support for merging node groups
Stephen Shirley [Fri, 18 Feb 2011 12:59:46 +0000 (13:59 +0100)]
Add support for merging node groups

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoAdd option to rename groups on conflict
Stephen Shirley [Fri, 18 Feb 2011 12:30:37 +0000 (13:30 +0100)]
Add option to rename groups on conflict

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix minor docstring typo
Stephen Shirley [Thu, 17 Feb 2011 16:00:24 +0000 (17:00 +0100)]
Fix minor docstring typo

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoAdd QA rapi test for instance reinstall
Guido Trotter [Fri, 18 Feb 2011 11:33:09 +0000 (11:33 +0000)]
Add QA rapi test for instance reinstall

This tests at least the basic case, unfortunately there is no way to
check all possibilities using the provided rapi client, as that will use
the new method unless the cluster doesn't support it.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoRAPI: remove required parameters for reinstall
Guido Trotter [Fri, 18 Feb 2011 11:20:01 +0000 (11:20 +0000)]
RAPI: remove required parameters for reinstall

Before c744425f354f1bef2d0d7d306e2d00c494d67d2b instance reinstall
accepted the "os" and "nostartup" optional query parameters. With that
commit it was changed to allow "os" "start" and "osparams" via body
rather than encoded in the URL. Unfortunately that commit introduced a
bug, which required the "os" parameter to be passed for body requests,
and at least one of "os" or "nostartup" for query request.

This fix makes sure all parameters are optional again.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix HV/OS parameter validation on non-vm nodes
Iustin Pop [Thu, 17 Feb 2011 16:06:59 +0000 (17:06 +0100)]
Fix HV/OS parameter validation on non-vm nodes

Currently, there is at least one LU that does wrong validation of HV
parameters (against all nodes, LUClusterSetParams). It's possible to
fix this case, but I went and modified the base functions to filter
out non-vm_capable nodes so all callers are protected.

Note: the _CheckOSParams function is never called with all nodes list,
so modifying it shouldn't be needed. However, I think it's safe to do
so (and it shouldn't hurt as an instance's node shouldn't ever lack
the vm_capable bit).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoNodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
Iustin Pop [Thu, 17 Feb 2011 13:42:57 +0000 (14:42 +0100)]
NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes

Since we don't have the data per design, UNAVAIL is appropriate here,
while NODATA is not.

The patch also adds a comment: if we extend the live fields list to
contain other data in the future, we need to reevaluate this solution.

This should fix issue 143. The listing now shows (node2==ofline,
node3==not vm_capable):

  Node     DTotal     DFree    MTotal     MNode     MFree Pinst Sinst
  node1    698.6G    630.5G     32.0G      1.0G     30.0G     8     7
  node2 (offline) (offline) (offline) (offline) (offline)     9     4
  node3 (unavail) (unavail) (unavail) (unavail) (unavail)     0     0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoNodeQuery: don't query non-vm_capable nodes
Iustin Pop [Thu, 17 Feb 2011 13:41:29 +0000 (14:41 +0100)]
NodeQuery: don't query non-vm_capable nodes

Because non-vm_capable nodes most likely don't have a hypervisor
configured and/or storage, so the call will fail anyway.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix LUClusterRepairDiskSizes and rpc result usage
Iustin Pop [Tue, 15 Feb 2011 13:39:44 +0000 (14:39 +0100)]
Fix LUClusterRepairDiskSizes and rpc result usage

This LU was introduced before the RPC result conversion from .data to
.payload, and it has managed to keep the old-style usage (how? it's
the only LU that does so). Fix by changing to payload, and add some
extra logging for easier diagnose.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix RPC mismatch in blockdev_getsize[s]
Iustin Pop [Tue, 15 Feb 2011 13:29:08 +0000 (14:29 +0100)]
Fix RPC mismatch in blockdev_getsize[s]

Commit 92fd2250 added consistency checks in the RPC layer, which broke
the call_blockdev_getsizes RPC call (declared with 's' at the end in
rpc.py, without 's' in the node daemon).

The immediate fix is to correct the rpc function name, the long term
one will be to remove this duplication.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>

9 years agoRemove superfluous redundant requirement
Stephen Shirley [Tue, 15 Feb 2011 16:40:54 +0000 (17:40 +0100)]
Remove superfluous redundant requirement

The condition is already covered by the previous requirement.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoDon't remove master_candidate flag from merged nodes
Stephen Shirley [Tue, 15 Feb 2011 14:29:03 +0000 (15:29 +0100)]
Don't remove master_candidate flag from merged nodes

Prevents lots of spurious warnings like:
2011-02-10 17:00:22,776: CRITICAL Configuration data is not consistent:
Not enough master candidates: actual 3, target 4

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoUse a consistent ECID base
Stephen Shirley [Tue, 15 Feb 2011 14:06:03 +0000 (15:06 +0100)]
Use a consistent ECID base

ECID was being calculated completely differently in
__MergeNodeGroups() and _MergeConfig()

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agolistrunner: convert from getopt to optparse
Iustin Pop [Wed, 16 Feb 2011 16:21:03 +0000 (17:21 +0100)]
listrunner: convert from getopt to optparse

The “-A” (use agent) was not documented, and instead of adding manual
listing, I converted it to optparse like the other CLI tools.

Note that I cleaned up a bit the usage and help texts.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agolistrunner: fix agent usage
Iustin Pop [Wed, 16 Feb 2011 12:32:29 +0000 (13:32 +0100)]
listrunner: fix agent usage

By delaying the agent key query until after the fork, we prevent the
problem of simultaneous access to the agent.

Tested that it works against 80 hosts in parallel without error; the
current version breaks already at 20 hosts.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoRevert "Disable the cluster-merge tool for the moment"
Stephen Shirley [Thu, 10 Feb 2011 16:32:26 +0000 (17:32 +0100)]
Revert "Disable the cluster-merge tool for the moment"

This reverts commit c0711f2cb989facd60430ab18c5b0e59a1f279ac.

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix cluster-merging by not stopping noded
Stephen Shirley [Thu, 10 Feb 2011 10:52:13 +0000 (11:52 +0100)]
Fix cluster-merging by not stopping noded

cli.RunWhileClusterStopped() stops noded on all of the nodes in the
original cluster. This prevents /etc/hosts updates on the master, and
config redistribution doesn't reach the other nodes in the original
cluster. As all we want to do is merge while the master is stopped,
simply stop it and start it again after.

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix bug in iallocator data structures build
Iustin Pop [Thu, 10 Feb 2011 13:55:08 +0000 (14:55 +0100)]
Fix bug in iallocator data structures build

Commit a1cef11c fixed non-vm_capable nodes export, but broke
inadvertently offline nodes. The update of the dict only needs to
happen for online nodes, in the 'if' block.

Without this patch, offline nodes keep the data from the last node
that was not offline; end result is that all nodes are considered
online (unless the first node is offline, in which case an error will
be raised).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9 years agoFix error msg for instances on offline nodes
Iustin Pop [Wed, 9 Feb 2011 09:04:39 +0000 (10:04 +0100)]
Fix error msg for instances on offline nodes

Currently, for both primary and secondary offline nodes, we give the
same message:
- ERROR: instance instance14: instance lives on offline node(s) node3
- ERROR: instance instance15: instance lives on offline node(s) node3
- ERROR: instance instance16: instance lives on offline node(s) node3
- ERROR: instance instance17: instance lives on offline node(s) node3

This is confusing, as an offline primary is in a different category
than a secondary. The patch changes the warnings to have different
error messages:
- ERROR: instance instance14: instance has offline secondary node(s) node3
- ERROR: instance instance15: instance has offline secondary node(s) node3
- ERROR: instance instance16: instance lives on offline node node3
- ERROR: instance instance17: instance lives on offline node node3

Thanks to Alexander Schreiber <als@google.com> for reporting this
issue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Alexander Schreiber <als@google.com>

9 years agoMinor reordering to match param order
Stephen Shirley [Tue, 8 Feb 2011 16:42:18 +0000 (17:42 +0100)]
Minor reordering to match param order

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agocluster verify and instance disks on offline nodes
Iustin Pop [Tue, 8 Feb 2011 16:07:13 +0000 (17:07 +0100)]
cluster verify and instance disks on offline nodes

Currently, cluster-verify says:

- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline
- ERROR: instance instance14: instance lives on offline node(s) node3
- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline
- ERROR: instance instance15: instance lives on offline node(s) node3

This is redundant as the “lives on offline node” message should be all we need to
understand the cluster situation.

The patch fixes this and also corrects a very old idiom.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>

9 years agoCluster verify and N+1 warnings for offline nodes
Iustin Pop [Tue, 8 Feb 2011 15:56:23 +0000 (16:56 +0100)]
Cluster verify and N+1 warnings for offline nodes

Currently, cluster verify shows warnings N+1 warnings for offline
nodes having any redundant instances since the memory data that we
have for those nodes is zero, so any instance will trigger the
warning.

As the comment says, we already list secondary instances on offline
nodes, so that warning is enough, and we skip the N+1 one.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>

9 years agoHandle gnt-instance shutdown --all for empty clusters
Stephen Shirley [Mon, 7 Feb 2011 15:35:34 +0000 (16:35 +0100)]
Handle gnt-instance shutdown --all for empty clusters

The current code gives:
Failure: prerequisites not met for this operation:
error type: wrong_input, error details:
Selection filter does not match any instances

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoUse gnt-node add --force-join to add foreign nodes
Stephen Shirley [Tue, 1 Feb 2011 15:59:46 +0000 (16:59 +0100)]
Use gnt-node add --force-join to add foreign nodes

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoAdd --force-join option to gnt-node add
Stephen Shirley [Tue, 1 Feb 2011 15:59:45 +0000 (16:59 +0100)]
Add --force-join option to gnt-node add

This is needed so cluster-merge can add nodes from other clusters.

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix iterating over node groups
Stephen Shirley [Tue, 1 Feb 2011 16:14:18 +0000 (17:14 +0100)]
Fix iterating over node groups

Current line tries to unpack dict incorrectly

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoUpdate NEWS file for the 2.4.0 rc1 release v2.4.0rc1
Iustin Pop [Fri, 4 Feb 2011 09:54:05 +0000 (10:54 +0100)]
Update NEWS file for the 2.4.0 rc1 release

Also bump up the version.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoDisable the cluster-merge tool for the moment
Iustin Pop [Fri, 4 Feb 2011 09:58:45 +0000 (10:58 +0100)]
Disable the cluster-merge tool for the moment

Hopefully this can be fixed before the final 2.4 release…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Stephen Shirley <diamond@google.com>

9 years agoBump up intra-cluster import connect timeout
Iustin Pop [Thu, 3 Feb 2011 15:19:52 +0000 (16:19 +0100)]
Bump up intra-cluster import connect timeout

Currently, the export timeout is 10 times 20 seconds, but the import
is only 30 seconds. I'm raising this to 60 seconds with two goals in
mind:

- when debugging manually, this allows for easier synchronisation of
  the processes
- 60 equals to 3 full 20 second intervals, which I think is better
  than just one an a half

This change shouldn't make a big difference either way (at most, it
will possibly delay the job in case of failures by half a minute).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoImport-export: fix logging of daemon output
Iustin Pop [Thu, 3 Feb 2011 13:17:58 +0000 (14:17 +0100)]
Import-export: fix logging of daemon output

In case of failures, the recent daemon output is logged as %r on a
list of unicode strings, which results in the (ugly):

Thu Feb  3 05:13:34 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: [u'  DUMP: Date of this level 0 dump: Thu Feb  3 05:13:18 2011', u'  DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output', u'  DUMP: Label: none', u'  DUMP: Writing 10 Kilobyte records', u'  DUMP: mapping (Pass I) [regular files]', u'  DUMP: mapping (Pass II) [directories]', u'  DUMP: estimated 54301 blocks.', u'  DUMP: Volume 1 started with block 1 at: Thu Feb  3 05:13:19 2011', u'  DUMP: dumping (Pass III) [directories]', u'  DUMP: dumping (Pass IV) [regular files]', u'socat: E SSL_write(): Connection reset by peer', u"dd: dd: writing `standard output': Broken pipe", u'  DUMP: Broken pipe', u'  DUMP: The ENTIRE dump is aborted.'])

This patch joins this list and makes it a non-unicode string, thus
resulting in the more readable (and ~10% shorter):

Thu Feb  3 05:16:04 2011 snapshot/0 failed to send data: Exited with status 1 (recent output:   DUMP: Date of this level 0 dump: Thu Feb  3 05:15:58 2011\n  DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output\n  DUMP: Label: none\n  DUMP: Writing 10 Kilobyte records\n  DUMP: mapping (Pass I) [regular files]\n  DUMP: mapping (Pass II) [directories]\n  DUMP: estimated 54350 blocks.\n  DUMP: Volume 1 started with block 1 at: Thu Feb  3 05:15:59 2011\n  DUMP: dumping (Pass III) [directories]\nsocat: E SSL_write(): Connection reset by peer\ndd: dd: writing `standard output': Broken pipe\n  DUMP: Broken pipe\n  DUMP: The ENTIRE dump is aborted.)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix handling of ^C in the CLI scripts
Iustin Pop [Thu, 3 Feb 2011 10:02:20 +0000 (11:02 +0100)]
Fix handling of ^C in the CLI scripts

This adds a message and nice handling of ^C, especially useful for
``gnt-job watch``.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9 years agoMerge branch 'devel-2.3' into devel-2.4
Michael Hanselmann [Thu, 3 Feb 2011 11:38:25 +0000 (12:38 +0100)]
Merge branch 'devel-2.3' into devel-2.4

* devel-2.3:
  backend: Disable compression in export info file

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agobackend: Disable compression in export info file
Michael Hanselmann [Thu, 3 Feb 2011 11:25:04 +0000 (12:25 +0100)]
backend: Disable compression in export info file

The new import/export infrastructure in Ganeti 2.2 and up handles
compression differently. It no longer writes compressed files to the
destination. Unfortunately changing this behaviour would be non-trivial,
so in the meantime setting “compression = none” will hopefully avoid
some confusion.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoReopen log files upon SIGHUP in daemons
Michael Hanselmann [Tue, 1 Feb 2011 15:31:15 +0000 (16:31 +0100)]
Reopen log files upon SIGHUP in daemons

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoutils.SetupLogging: Return function to reopen log file
Michael Hanselmann [Mon, 31 Jan 2011 16:26:55 +0000 (17:26 +0100)]
utils.SetupLogging: Return function to reopen log file

This function can be used from a SIGHUP handler to reopen log files.
Initial, simple unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoutils.SetupLogging: Make program a mandatory argument
Michael Hanselmann [Mon, 31 Jan 2011 16:03:18 +0000 (17:03 +0100)]
utils.SetupLogging: Make program a mandatory argument

It's passed in by most users (daemons, CLI scripts) and for the others
(burnin, watcher) it certainly doesn't hurt, especially when using
syslog.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoutils.log: Restrict I/O error handling coverage
Michael Hanselmann [Mon, 31 Jan 2011 15:46:21 +0000 (16:46 +0100)]
utils.log: Restrict I/O error handling coverage

The I/O error will occur while opening the file, not while adding
and configuring the handler.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoutils.log: Split formatter building into separate function
Michael Hanselmann [Mon, 31 Jan 2011 15:43:28 +0000 (16:43 +0100)]
utils.log: Split formatter building into separate function

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoburner: Trivial code cleanup
Michael Hanselmann [Mon, 31 Jan 2011 13:58:26 +0000 (14:58 +0100)]
burner: Trivial code cleanup

- Use constant for exit value
- Configure logging from main function, not from class' “__init__”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoburnin: Reuse existing function for debug value
Michael Hanselmann [Mon, 31 Jan 2011 13:54:38 +0000 (14:54 +0100)]
burnin: Reuse existing function for debug value

Instead of using its own, burnin can use cli.SetGenericOpcodeOpts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoMerge node groups from other cluster
Stephen Shirley [Tue, 1 Feb 2011 12:07:31 +0000 (13:07 +0100)]
Merge node groups from other cluster

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoEnforce that new node groups have unique names
Stephen Shirley [Mon, 31 Jan 2011 16:07:08 +0000 (17:07 +0100)]
Enforce that new node groups have unique names

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoAdd _UnlockedLookupNodeGroup()
Stephen Shirley [Mon, 31 Jan 2011 16:00:03 +0000 (17:00 +0100)]
Add _UnlockedLookupNodeGroup()

This allows calling of _UnlockedLookupNodeGroup() from within
AddNodeGroup()

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agocluster-merge should refuse to merge own cluster
Stephen Shirley [Mon, 31 Jan 2011 14:19:48 +0000 (15:19 +0100)]
cluster-merge should refuse to merge own cluster

Also fix type of Merger.cluster_name from list to string. This would
have triggered an error in sshRunner if cluster keys were in use.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9 years agoMinor grammar fix in QuitGanetiException docstring
Stephen Shirley [Mon, 31 Jan 2011 13:49:03 +0000 (14:49 +0100)]
Minor grammar fix in QuitGanetiException docstring

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoFix grammar of var naming
Stephen Shirley [Mon, 31 Jan 2011 13:18:48 +0000 (14:18 +0100)]
Fix grammar of var naming

flatten is the verb, flattened is the adjective.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoIntroduce re-openable log record handler
Michael Hanselmann [Mon, 31 Jan 2011 12:52:39 +0000 (13:52 +0100)]
Introduce re-openable log record handler

This patch adds a new log handler class based on the standard library's
BaseRotatingHandler. This new class allows the log file to be re-opened,
e.g. upon receiving a SIGHUP signal. The latter will be implemented in
forthcoming patches. The patch does not change the behaviour regarding
writing to /dev/console.

Quite a bit of code had to be changed to unittest the log handlers.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoRe-create instance disk symlinks on activate
Iustin Pop [Fri, 28 Jan 2011 15:53:40 +0000 (16:53 +0100)]
Re-create instance disk symlinks on activate

This patch implements recreation of instance disk symlinks when the
activate-disks operation is run. Until now, it was not possible to
re-create these symlinks without stopping and starting or migrating an
instance as the RPC call where this is done was in instance startup
and migration.

In order to do this, the blockdev_assemble rpc call needs the disk
index too, which is added to the protocol. This is a change from 2.3
and makes instance startup incompatible (FYI).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoAdd RAPI resource for instance console
Michael Hanselmann [Fri, 28 Jan 2011 14:21:04 +0000 (15:21 +0100)]
Add RAPI resource for instance console

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoExport console information as query field
Michael Hanselmann [Fri, 28 Jan 2011 13:10:09 +0000 (14:10 +0100)]
Export console information as query field

This makes it possible to get the console information via a LUXI query.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agomanpage: gnt-group remove cannot remove last group
Stephen Shirley [Fri, 28 Jan 2011 13:49:34 +0000 (14:49 +0100)]
manpage: gnt-group remove cannot remove last group

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoConfigWriter: add checks for be/nd/nic params
Iustin Pop [Fri, 28 Jan 2011 13:05:09 +0000 (14:05 +0100)]
ConfigWriter: add checks for be/nd/nic params

This adds checking (in the configuration) for invalid be, nd and nic
params. The code is a bit tricky as nd params are at cluster,
nodegroup and node level, nicparams are at cluster and nic level,
whereas beparams are at cluster and instance level.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoConfigWriter: simplify _UnlockedVerifyConfig
Iustin Pop [Fri, 28 Jan 2011 13:04:22 +0000 (14:04 +0100)]
ConfigWriter: simplify _UnlockedVerifyConfig

This just adds a 'cluster' local variable for reducing duplication.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoAdd e1000 nic support for HVM
Guido Trotter [Fri, 28 Jan 2011 12:32:30 +0000 (13:32 +0100)]
Add e1000 nic support for HVM

Closes issue: 130

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoPrevent removal of last node group
Stephen Shirley [Fri, 28 Jan 2011 12:24:30 +0000 (13:24 +0100)]
Prevent removal of last node group

- Add check in ConfigWriter to prevent last node group from being
  removed
- Tidy up error message a bit

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix instance list for instances running multiple times
René Nussbaumer [Fri, 28 Jan 2011 10:31:55 +0000 (11:31 +0100)]
Fix instance list for instances running multiple times

If for some reason (e.g. failed migration) one instance is running
on multiple nodes the output can become inconsistent. To get that error
and make it consistent between runs we make the call on the secondary
too and look if it's running there. If so we report the instance as
ERROR_wrongnode.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoSmall QA fixes: groups via RAPI, cluster OOB
Michael Hanselmann [Thu, 27 Jan 2011 19:24:18 +0000 (20:24 +0100)]
Small QA fixes: groups via RAPI, cluster OOB

Add “cluster-oob” to sample configuration file. Don't run RAPI group
tests if disabled.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoMerge branch 'devel-2.3' into devel-2.4
Michael Hanselmann [Thu, 27 Jan 2011 17:07:09 +0000 (18:07 +0100)]
Merge branch 'devel-2.3' into devel-2.4

* devel-2.3:
  Wait for master to become available on initialization
  Start all daemons on cluster initialization
  Clarify job processing order in admin guide
  Improve option descriptions
  Remove two unused variables
  Fix LUOSDiagnose and non-vm_capable nodes
  Rephrasing two error messages for auto promotion
  storage: Check that mapper is either used or None
  Fix bug in “gnt-node list-storage”
  Improve import/export timeout settings
  Increase remote import/export timeout

Conflicts:
lib/constants.py: Trivial
lib/objects.py: Trivial
qa/qa_node.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agocluster verify: add hvparams verification
Iustin Pop [Thu, 27 Jan 2011 15:44:17 +0000 (16:44 +0100)]
cluster verify: add hvparams verification

Currently, the validity of the hypervisor parameters is only checked
at init/modification time, and not in the cluster verify. This is bad,
as it can lead to inconsistent state that is only detected when the
next modification (which can be unrelated) is made, leading to
unexpected error messages.

This patch adds both syntax verification (in masterd) and validity
verification on remote nodes. The downside of the patch is that on
clusters with many instances which have custom parameters, it will be
slow. A possible improvement would be to detect duplicate, identical
set of parameters, and collapse these into a single verification, but
that is left as a TODO (in case it becomes problematic).

An additional change is in utils.ForceDict, where we said 'key',
whereas this function is always used with parameter dicts, so I
changed it to "Unknown parameter".

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoRemove dumb-allocator
Guido Trotter [Thu, 27 Jan 2011 13:47:18 +0000 (14:47 +0100)]
Remove dumb-allocator

- Remove the actual code
- Remove mentions of it from iallocator.rst, and use hail instead
- Also remove mentions of "etch-image" and use "debootstrap+default"
- Mention htools as the reference implementation in iallocator.rst

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoOpen other clusters' config in foreign mode
Stephen Shirley [Thu, 27 Jan 2011 14:16:09 +0000 (15:16 +0100)]
Open other clusters' config in foreign mode

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoAdd (unused) arg to _OfflineClusterMerge
Stephen Shirley [Thu, 27 Jan 2011 15:18:48 +0000 (16:18 +0100)]
Add (unused) arg to _OfflineClusterMerge

cli._RunWhileClusterStoppedHelper.Call passes (self, *args) to functions
called via cli.RunWhileClusterStoppedHelper(). The code in cluster-merge
was broken by commit d8aab233.

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoFix unittest breakage on Python 2.4/2.5
Michael Hanselmann [Thu, 27 Jan 2011 13:04:41 +0000 (14:04 +0100)]
Fix unittest breakage on Python 2.4/2.5

Commit 70b0d2a29 broke unittests on Python 2.4 and 2.5. Turns out that
Python 2.6 and above allow classes to be passed as custom test runners,
whereas earlier versions don't.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoCheck for duplicate RAPI URIs and handlers
Michael Hanselmann [Wed, 26 Jan 2011 18:36:45 +0000 (19:36 +0100)]
Check for duplicate RAPI URIs and handlers

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoEnsure all resources are used by RAPI client
Michael Hanselmann [Wed, 26 Jan 2011 18:18:16 +0000 (19:18 +0100)]
Ensure all resources are used by RAPI client

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoRAPI client: De-/activating instance disks
Michael Hanselmann [Wed, 26 Jan 2011 18:17:18 +0000 (19:17 +0100)]
RAPI client: De-/activating instance disks

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoRAPI client: Wrap /2/redistribute-config resource
Michael Hanselmann [Wed, 26 Jan 2011 18:09:28 +0000 (19:09 +0100)]
RAPI client: Wrap /2/redistribute-config resource

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoAdd unittest for RAPI client's ModifyInstance
Michael Hanselmann [Wed, 26 Jan 2011 18:04:19 +0000 (19:04 +0100)]
Add unittest for RAPI client's ModifyInstance

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9 years agoWatcher: Fix endless repair tries for broken secondary
René Nussbaumer [Thu, 27 Jan 2011 09:22:50 +0000 (10:22 +0100)]
Watcher: Fix endless repair tries for broken secondary

In cases where secondary was offline and not evacuated watcher tried
to activate-disks in an endless manner, but this is useless, as the
secondary is offline and therefore not responding to this approach.

This patch skips activation of the disk if the secondary is bad but
instance up and running.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoVerify disks: increase parallelism and other fixes
Iustin Pop [Wed, 26 Jan 2011 17:54:36 +0000 (18:54 +0100)]
Verify disks: increase parallelism and other fixes

The recent work on multi-VG support has converted LUClusterVerifyDisks
into doing serialised calls to each node, as each node can have
different VGs. This is suboptimal, especially for big clusters, where
this LU is executed by the watcher very often.

This patch changes the logic based on the observation that querying a
node for its VGs and then requesting a LV list for those VGs is
equivalent to simply asking for all LVs, without specifying the VG
name(s). So backend.py needs changes to accept an empty VG list, and
the LU itself partially reverts to the previous version.

Additionally, we do two other fixes to this LU:

- small improvement in getting the instance list from the config
- MapLVsByNode works for all disk types, hence no need to restrict to
  the DRBD template, especially as today we can "recreate" disks for
  plain volumes too (the warning message in gnt-cluster is updated
  too)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agognt-cluster verify-disks: fix VG name
Iustin Pop [Wed, 26 Jan 2011 17:51:19 +0000 (18:51 +0100)]
gnt-cluster verify-disks: fix VG name

Recent multi-VG work already exports the missing LV names as vg/lv,
not simply lv. So the query and addition of the VG name in gnt-cluster
verify-disks is redundant, and even wrong for non-default-VG
instances.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoDeactivate disks: allow skipping hypervisor checks
Iustin Pop [Wed, 26 Jan 2011 16:45:10 +0000 (17:45 +0100)]
Deactivate disks: allow skipping hypervisor checks

In some cases (e.g. the hypervisor not running at all), we might want
to force disk deactivation, skipping the hypervisor checks. I believe
this is not a good thing to do all the time, so this patch adds the
force option to allow manual selection of this operation mode.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9 years agoWait for master to become available on initialization
Michael Hanselmann [Wed, 26 Jan 2011 15:46:56 +0000 (16:46 +0100)]
Wait for master to become available on initialization

This is analogue to the existing check for a responsive node daemon.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9 years agoStart all daemons on cluster initialization
Michael Hanselmann [Wed, 26 Jan 2011 15:45:11 +0000 (16:45 +0100)]
Start all daemons on cluster initialization

At least ganeti-confd was not started. It got started a few minutes
later by ganeti-watcher. Also move one pylint disable to the effective
line.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>