11:57 -!- rdd [~rddunlap@pool-71-245-102-105.ptldor.fios.verizon.net] has joined #storage 11:57 -!- Topic for #storage: Welcome to the Linux Storage Summit discussion (1-888-967-2253, id 907086) http://test.kernel.org/storage/ for slides 11:57 -!- Topic set by willy [willy@128.189.239.68] [Wed May 24 11:25:32 2006] 11:57 [Users #storage] 11:57 [@agk ] [@mkp ] [ alexisb ] [ jgarzik] [ MarkLord] 11:57 [@hch ] [@patman] [ andmike ] [ jhr ] [ praka ] 11:57 [@jejb ] [@tejun ] [ ewilliam] [ lcm ] [ rdd ] 11:57 [@mbligh] [@willy ] [ ggg ] [ markh_ ] 11:57 -!- Irssi: #storage: Total of 19 nicks [8 ops, 0 halfops, 0 voices, 11 normal] 11:57 -!- Channel #storage created Wed May 24 10:21:38 2006 11:57 -!- Irssi: Join to #storage was synced in 0 secs 11:57 -!- andmike [~andmike@128.189.232.54] has quit [Ping timeout: 480 seconds] 11:58 <@tejun> how about implementing simple scheduler which only deals with merging above dm (including plugging if necessary) and let the real io scheduling where they are now? 11:58 <@patman> tejun: guess that needs request based dm ? 12:00 <@tejun> right 12:06 <@agk> the hardware handler is a hack to work around deficiencies in the scsi layer 12:06 <@jejb> deficiencies in SCSI? surely not ... 12:07 < ewilliam> perhaps but is it also not necessarily to deal with vendor specific implementations of mulitpath devices? 12:08 <@agk> for pg initialisation, yes 12:08 <@agk> but for error processing, I think all hw handlers would do the same thing 12:08 <@mkp> Ok, this phone stuff is a waste of time. I'd rather get some work done. 12:08 <@agk> if scsi was able to decode the errors 12:08 <@mkp> I'll leave the bridge line open for now 12:08 <@agk> mkp, yep - it's very hard to hear what ppl are saying 12:09 <@jejb> we're basically trying to establish whether dm-multipath would be better off request based 12:09 <@willy> phones suck. 12:10 <@jejb> but also, if you can't here ... say so ... we can move the speaker closer to the 'phone 12:11 <@agk> jejb, please try - check if there's any volume control on the microphone too 12:11 < ewilliam> jejb, I can not fully make out what anyone says at the conference. Ed's comments are all I can clearly hear. 12:11 <@willy> agk: I think I turned it up to max 12:11 <@jejb> it's a polycom ... they don't seem to have mic volumes 12:11 <@willy> jejb: what's that switch underneath? it seemed to have an 'off' 'low' and 'high' setting 12:11 <@willy> i turned it to high 12:11 <@agk> also path testing prioritisation? 12:11 <@willy> it might just be the ringer 12:12 < ggg> willy: it sounds like the ring volume 12:13 <@agk> current design has userspace issuing the i/o to test if paths have come back - need a way for that i/o to jump to the front of the queue 12:13 <@jejb> if you had your own scheduler, you could do that, though 12:14 < ggg> if a path has failed, there should be no IO queued, right? 12:14 <@agk> false 12:14 < ggg> I was expecting MPIO or upper layer to redirect IO to an alternate path 12:15 < ewilliam> We do have to set the EMC configuration to tell the EMC hardware handler to "honor" reservations. 12:16 < ewilliam> aka "hardware_handler 3 emc 0 1" 12:20 < ewilliam> Doesn't the /dev/mapper/ provide a persistent naming consistency as well as the new "user friendly" device naming? 12:20 <@agk> 'limited ioctl support' - mbroz @RH has started working on patches to allow certain ioctls to be passed through 12:21 -!- andmike [~andmike@128.189.232.54] has joined #storage 12:24 <@agk> suse 12:25 <@willy> hwscan 12:25 <@willy> at least for the case that causes me problems 12:26 < jgarzik> oh yeah 12:27 < jgarzik> I plan on expanding blktool (ethtool counterpart for storage), so suggestions are welcome 12:28 < jgarzik> ethtool == single tool to control various hardware-specific attribs 12:29 < MarkLord> Ditto for hdparm. 12:30 <@jejb> agk, can you comment on the request based dm-multipath 12:33 < jgarzik> MarkLord: hdparm is specific to IDE 12:34 < jgarzik> MarkLord: also, not much vendor-specific stuff in hdparm. blktool is adding HBA register dumps (a la ethtool), and similar hardware-specific features 12:34 <@willy> as long as blktool.h doesn't use u32 types ... 12:38 < jgarzik> hehe 12:38 < jgarzik> there is no blktool.h 12:40 <@mkp> people breaking for lunch? 12:41 < praka> silence means -> no one left to answer??? 12:42 -!- MarkLord [~root@rtr.ca] has quit [Remote host closed the connection] 12:42 <@patman> :) 12:42 <@mkp> any objections to me shutting down the bridge line? 12:43 < praka> fine -- will it be back online in an hour? 12:43 <@patman> i can't hear enough to make it worthwhile 12:43 < praka> i only heard ed. 12:43 < praka> but again, he called in... 12:43 <@mkp> I don't think it's worth it 12:43 <@patman> i don't know how ed heard all that, guess he didn't blast himself 12:44 < praka> oh well... I'll check back on IRC in an hour. 12:44 -!- praka [~praka@cpe-66-75-130-199.san.res.rr.com] has quit [Quit: User abortion with 5 coathooks] 12:48 -!- jejb [~jejb@128.189.238.8] has quit [Quit: Leaving] 12:48 -!- ewilliam [~ewilliam@stat16.steeleye.com] has left #storage [Leaving] 12:57 -!- jejb [~jejb@128.189.238.8] has joined #storage 13:18 -!- andmike [~andmike@128.189.232.54] has quit [Ping timeout: 480 seconds] 13:19 -!- willy [willy@128.189.239.68] has quit [Ping timeout: 480 seconds] 13:28 -!- willy [willy@128.189.239.68] has joined #storage 13:31 < jejb> mkp, do you want us to try to re-establish the voice link? 13:31 <@mkp> I can't hear a thing. Maybe if somebody else is interested 13:33 < jejb> OK ... we'll begin again here then 13:37 -!- ewilliam [~ewilliam@stat16.steeleye.com] has joined #storage 13:50 -!- praka [~praka@cpe-66-75-130-199.san.res.rr.com] has joined #storage 13:55 -!- ewilliam [~ewilliam@stat16.steeleye.com] has left #storage [Leaving] 14:13 -!- patman [~patman@216-99-218-173.dsl.aracnet.com] has left #storage [Leaving] 14:31 -!- praka [~praka@cpe-66-75-130-199.san.res.rr.com] has quit [Quit: [BX] Mr. Rogers uses BitchX. Won't you be my neighbor?] 14:49 -!- ggg [~grundler@c-24-6-48-165.hsd1.ca.comcast.net] has left #storage [looking for chocolate...] 15:02 -!- axboe [~axboe@brick.kernel.dk] has joined #storage 15:41 -!- markh_ [~markh@fw.osdl.org] has quit [Quit: Leaving] 15:50 < jgarzik> man, IRC traffic slowed to a trickle, post lunch :) 15:52 <@mbligh> we're all asleep 15:55 < jgarzik> that doesn't bode well for the summit :) 15:55 < jgarzik> Let's hope Luben's not speaking... 15:55 <@willy> Luben's not here ;-) 15:55 <@mkp> hah 15:55 <@willy> and the phone's off 16:25 * axboe is feeling the jet lag creep in 16:25 * mkp hands axboe a bucket of c0ffee 16:25 < axboe> just downed a cup, not helping much 16:26 <@hch> axboe: I wouldn't call that coffee ;-) 16:26 < axboe> hch: black nasty tasting stuff would be more accurate 16:28 < jejb> axboe, I was planning to throw hch into the sea after this is over .. would that help for you? 16:29 < axboe> jejb: it'd make for a good photo op, at least 16:32 < jejb> it would certainly make Luben's day ... 16:33 -!- lcm [~lcm@bi01p1.co.us.ibm.com] has quit [Ping timeout: 480 seconds] 16:34 < axboe> jejb: not usually a priority for you :) 16:50 -!- mnc [~mnc@128.189.238.155] has joined #storage 17:15 -!- mnc [~mnc@128.189.238.155] has quit [Remote host closed the connection] 17:16 -!- ggg [~grundler@156.153.255.242] has joined #storage 17:17 -!- alexisb [~ahbruem@128.189.237.121] has left #storage [] 17:19 -!- jejb [~jejb@128.189.238.8] has quit [Quit: Leaving] 17:23 -!- tejun [~tj@128.189.235.161] has quit [Ping timeout: 480 seconds] 17:38 -!- willy [willy@128.189.239.68] has quit [Ping timeout: 480 seconds] 17:43 < ggg> ok...so I gather folks are heading out for dinner? 17:49 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has joined #storage 18:06 -!- jgarzik [~jgarzik@cpe-065-190-194-075.nc.res.rr.com] has left #storage [Client exiting] 18:06 -!- jgarzik [~jgarzik@cpe-065-190-194-075.nc.res.rr.com] has joined #storage 18:06 < jgarzik> can someone op me, before everyone else drops out? 19:05 -!- ggg [~grundler@156.153.255.242] has quit [Quit: Bailing] 19:22 <@rdd> yo 23:23 <@willy> jgarzik: it's ok, i registered this with chanserv, so it'll auto-op me whenever i return 23:24 <@willy> (not that i object to basically everyone having ops; there has been occasional trouble with flooders on oftc, so it's important there's enough potential chanops around to kick them out, should they trouble us) 23:25 -!- willy changed the topic of #storage to: Welcome to the Linux Storage Summit discussion | http://test.kernel.org/storage/ for slides | Phone deemed useless; willy to summarise Thursday's discussion to IRC Day changed to 25 May 2006 00:10 <@jgarzik> who will summarize Wednesday's discussion? 04:33 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has quit [Read error: Connection reset by peer] 04:35 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has joined #storage 05:55 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has quit [Ping timeout: 480 seconds] 06:32 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has joined #storage 06:36 <@axboe> morning 06:38 < willy> morning (stupid fatport) 06:38 <@axboe> willy: indeed 06:38 <@axboe> at least it's up now 06:38 < willy> yup 06:39 -!- ewilliam [~ewilliam@vsat-148-63-28-138.c001.g4.mrt.starband.net] has joined #storage 06:39 <@hch> morning 06:39 <@axboe> morning 06:39 < ewilliam> morning 06:42 <@axboe> -EXTRAVERSION =-rc4 06:42 <@axboe> -NAME=Sliding Snow Leopard 06:42 <@axboe> +EXTRAVERSION =-rc5 06:42 <@axboe> +NAME=Lordi Rules 06:42 <@axboe> haha 06:42 <@mkp> heh 06:42 < willy> so we're meetig in the lobby at 7am to go for a walk in Stanley Park, if you're interested 06:44 * mkp quickly summons Larry's jet 06:54 < willy> Anybody have a suggestion for a suitable wiki to use to put up noters from yesterday's discussion? 06:55 < willy> i'd like to put up a first draft, then let other people edit it 07:21 -!- markh [~markh@fw.osdl.org] has joined #storage 08:15 <@jgarzik> willy: I thought mbligh was posting stuff to the URL in /title? 08:31 <@rdd> only the dm slides so far, and they also went out by email 09:01 <@rdd> is there an updated agenda for today ? 09:02 <@axboe> there is, but the whiped the board so probably only in james' notebook 09:03 <@rdd> :) 09:03 <@rdd> markh hi 09:04 <@rdd> markh: when are you going to seattle? 09:05 < markh> rdd, Hi. I'm leaving Friday morning early 09:05 <@rdd> markh, yeah, me too 09:06 < markh> rdd, I'll probably be getting there just in time for rush hour. 09:06 <@rdd> ditto 09:42 -!- willy [~willy@S0106001217ae4b5c.vc.shawcable.net] has quit [Ping timeout: 480 seconds] 09:46 -!- ggg [~grundler@c-24-6-48-165.hsd1.ca.comcast.net] has joined #storage 10:04 -!- willy [willy@128.189.239.68] has joined #storage 10:12 -!- jejb [~jejb@128.189.238.8] has joined #storage 10:18 * mkp goes to make tea 10:20 <@willy> jejb's done an agenda 10:20 <@willy> i think this is a triumph of optimism over experience 10:20 <@willy> 10:00 coffee + pastries 10:21 <@willy> 10:30 I/O Scheduler / Multipathing 10:21 <@willy> 11:00 Error Handling 10:21 <@willy> 11:30 Barriers 10:21 <@willy> 12:00 Netlink / Req/Response / ioctl 10:21 <@willy> 12:30 Lunch 10:21 <@willy> 13:30 Migrating LibATA to Block 10:21 <@willy> 14:00 Queue Handling 10:22 <@willy> 14:30 Tools 10:22 <@willy> 15:00 SAS/SATA 10:22 <@willy> 15:30 coffee+tea 10:22 <@willy> 16:00 Bugs and Testing 10:22 <@willy> 16:30 SCSI Maintainer 10:22 <@willy> 17:00 head out and continue unofficial discussions 10:23 <@willy> btw, is someone logging this? 10:23 <@mbligh> No ;-) 10:23 <@willy> I wish they would ;-) 10:23 -!- alexisb [~ahbruem@128.189.237.121] has joined #storage 10:23 <@mkp> bummer, I should have flown out yesterday 10:23 <@willy> we're starting 10 minutes early on Multipathing 10:24 <@willy> Mike Anderson is talking about I/O progress under memory pressure 10:24 <@willy> thinks we're filling the page cache, and we're not making progress because there's no way for user processes to run 10:25 <@mkp> did you wiki the notes from yesterday? 10:25 <@willy> nobody suggested a wiki, so no 10:25 <@willy> mbligh says we need two watermarks rather than just one 10:26 <@willy> jejb says this is similar to the gfp emergency pools 10:26 <@willy> and we run out of those, so it's a bad idea 10:27 <@willy> jejb says scsi requires enough resoursec to get one command to the device at all times 10:27 <@willy> mbligh says it's not just scsi, it's also ext3 10:27 <@mkp> journaling makes things worse 10:27 <@willy> that needs to allocate stuff 10:28 <@mkp> often journaling ops require allocating to be able to free 10:28 <@willy> right, and multipath DM needs more than one command to be able to be sent 10:28 <@mkp> let's IRIXify the kernel and never fail an alloc :) 10:28 <@willy> mbligh thinks most of the VM is asynchronous 10:28 <@willy> kswapd et al 10:30 <@willy> discussed some vm techniques; mbligh still thinks it's solvable ;-) 10:31 <@willy> mike anderson thinks we can solve this by auditing all the paths 10:32 <@willy> mbligh thinks having an emergency pool isn't enough because you might get into a direct reclaim state if you try to claim from the regular pool first 10:34 <@mkp> I don't know of any Unix that you can't drown in I/O 10:34 <@willy> jejb keeps making the point that scsi has this problem solved at the bottom; the DM layer needs a solution that makes sure *it* can always make fwd progress 10:35 <@mkp> and MD... and the fs... 10:35 <@willy> we're only trying to solve multipath here 10:36 <@mkp> it's tricky when you stack things 10:36 <@willy> yeah, each layer needs its own solution 10:36 <@willy> and they're trying to put bits of DM in userspace 10:36 <@mkp> unfortunately you don't know in advance how many bios/whatever you need 10:36 <@willy> so, er ... 10:36 <@willy> max one per path, aiui 10:37 <@mbligh> willy, OK, I lied. I am logging this on dircproxy 10:37 <@willy> good, thanks 10:37 <@willy> if i misinterpret or miss stuff, please jump in 10:37 <@mkp> willy: it gets trickier to predict with raid when a request ends up straddling multiple devices 10:38 <@mbligh> so there's two problems - 1. making sure the scsi / md / dm stuff can do the failover 2. Not wedging the rest of the system on non-functional reclaim in the meantime 10:39 <@willy> ok, moving on to the next point: IO throughput 10:40 <@willy> IO schedulers are part of the problem; they think they need to make DM request-based 10:41 <@willy> we don't know how much IO an individual bio represents, so just alternating patchs after N bios isn't good enough 10:42 <@willy> there's discussion of whether we need a multipath aware scheduler to use 10:42 <@willy> jens doesn't think so 10:43 <@willy> it seems too complicated a design, since it can't have enough information to do its job 10:44 < ggg> willy: is the problem the host OS can't differentiate between multi-path to the same storage vs mirror'd storage? 10:44 <@willy> that's one problem, yes 10:44 <@mbligh> willy, on the previous topic ... thinking about it, direct reclaim is designed to by syncronous - kswapd is async 10:44 <@mbligh> s/to by/to be/ 10:45 <@willy> i'm not going to try to ascii-art the diagram that's being drawn now ;-) 10:45 < ggg> willy: wouldn't UUID's or somesuch be helpful in solving that problem? 10:46 <@willy> ggg: basically, you can construct an arbitrarily complex topology with different resource constraints, so a god-like scheduler at the top really isn't enough 10:46 < ggg> willy: and let me guess again: is the other problem the scheduling depends on FS layout and how it directs data? 10:46 < ggg> ok 10:46 <@willy> the discussion is tending towards being able to push requests back *up* from lower levels 10:47 < ggg> ie "I'm _really_ busy, try another path" ? 10:47 <@willy> yeah 10:47 <@willy> and the scheduler can either push it back onto the higher queue, or it can send it back down a different path 10:48 < ggg> I'd prefer low lever drivers advertise the info needed to make that choice than have the IO "ownership" bounce back-n-forth between the layers. 10:49 <@willy> but sometimes the driver can't know -- for example, it gets a device busy if you're in a multi-initiator environment 10:49 < ggg> true. That's need to be config'd by the sys admin to avoid. 10:51 < ggg> I"m thinking of the case where say low level driver has near max IOs already pending down a particular path...the multipath driver could end up bouncing the IO down several paths before one accepts the IO. 10:51 < ggg> just alot of busy work... 10:54 <@mkp> anybody looked into how to handle barriers across two write-enabled paths? 10:55 <@willy> i think that duiscussion is later 10:55 <@willy> jejb hates barriers and wants to use linked commands 10:55 <@mkp> or any out of order writes for that matter 10:56 <@willy> ggg: apparently we can already monitor the depth of the lower queues 10:57 < ggg> willy: ok...that's the better half of the problem...size of the IOs is the other factor. 10:57 < ggg> (and the rate that they can be serviced) 10:58 < ggg> willy: I guess this is really a QoS problem that's getting pushed up into the multi-path code. 10:58 < ggg> each transport has it's own way of dealing with QoS. 10:58 < ggg> (or not...like PSI) 11:00 * ggg doesn't want to derail the meeting...but would gnomemeeting for for picking up audio from everyone's laptops? /o\ 11:00 <@willy> heh 11:00 < ggg> damn geeks....send'm home ;) 11:03 <@rdd> ggg derails. 11:04 <@willy> moving on the device vendor specific error state 11:04 <@willy> trying to map other device types into ASC/ASCQ 11:04 <@willy> basically impossible; they tried it with DASD 11:05 <@willy> jejb thinks we don't need to fix this with a request-based DM 11:05 <@mkp> I'd rather have somewhat generic errors like SECTOR_READ_FAILURE, DEVICE_UNREACHABLE, etc. 11:06 * ggg agrees with mkp...and try to distinguish "transport" from "device" errors...not easy either. 11:06 <@mkp> for some of the stuff I'm working on, it would be incredibly useful to be able to know the nature of the error 11:07 <@willy> whether it's transport or device? 11:07 < ggg> transport, device, or unknown (ie timeout) 11:08 <@mkp> As Grant said, it's sometimes hard to find out whether it's transport or device. Especially with FC 11:08 < ggg> I was just thinking "timeout" - we have NFC what the problem is 11:08 <@mkp> path between switch and HBA might be up, but the link between switch and array could be crapped out 11:09 -!- mnc [~mnc@128.189.238.155] has joined #storage 11:09 < ggg> mkp: willy wants to know _why_ it would be helpful to differentiate (I think)...I have some ideas but it's been ~10 years since I've seriously poked at this 11:09 <@willy> nono, just checking i understood 11:09 <@willy> i passed the comment on 11:10 * ggg nods 11:10 <@willy> mmike christie talked about something he did ages ago with BLK_RETRYABLE 11:10 <@mkp> just pointing out that you might not always get a LIP when something goes away 11:10 <@mkp> same goes for iSCSI, I guess 11:11 < ggg> ditto for parallel SCSI 11:12 <@willy> agk: are you there? 11:15 <@mkp> git pull 11:15 <@mkp> duh 11:15 <@willy> they're discussing prep queue function stacking 11:20 <@willy> jejb just figured out that scsi messes with the request queue directly, and it'll have to stop doing that 11:22 <@axboe> willy: the settings, it's a minor thing 11:23 <@willy> scsi3 reservations: just go see steeleye ;-) 11:23 <@willy> jejb "there's a misperception based on how solaris works" ;-) 11:23 <@willy> axboe: yes, SMOP 11:29 <@willy> ok, half an hour late, we're moving on to hch doing error handling 11:30 <@willy> design issue: the way we try to send down eh commands 11:32 <@willy> the eh fields in the cmd are abused by driver writers 11:33 <@willy> hch has prototyped removing the eh fields from the scsi_cmnd 11:33 <@willy> removed ~300 lines from the kernel 11:33 <@mkp> interesting 11:33 <@willy> fixed probably a dozen bugs 11:33 <@willy> in rarely-tested error paths 11:35 <@willy> some discussion of the 'bad sector' problem. SCSI drives return 'this is the bad sector' 11:35 <@willy> IDE drives say 'something went wrong' 11:35 < ggg> willy: HPUX had the same issues...testing error paths deterministically requires a scsi target emulator. 11:35 <@mkp> we have an hba+disk emulator 11:35 < ggg> willy: rich testardi/mike bappe (later went to EMC) had written one using 53c720 chips under HPUX. 11:36 <@mkp> the hard part is simulating a disk hanging off of $RANDOM_HBA 11:36 <@willy> mkp: target mode, i guess ... 11:36 <@mkp> I guess we could write a target program 11:36 <@mkp> yeah 11:36 < ggg> willy: yes, put another host controller in target mode on the other end of the SCSI bus. 11:37 < ggg> doign that for 53c1010 is doable, but really hard if you don't know NCR/LSI scripts. 11:37 < ggg> I don't see anyway for FC/IDE/u320SCSI to do the same thing. 11:37 <@willy> i need to get my head around that at some point anyway 11:39 < ggg> mkp: can you put tachlite chips in the equivalent to "target mode"? 11:39 <@willy> next topic: for EH we don't want to obey the queue restrictions 11:39 <@mkp> ggg: yes, definitely 11:39 <@willy> ->queuecommand can return host busy 11:39 <@willy> but right now the eh doesn't check that value 11:40 < ggg> mkp: then I'd think that would be a much easier place to implement target mode stuff - just the setup is alot easier. 11:40 <@mkp> ggg: yeah. Shouldn't be that hard to "turn my driver around" 11:41 <@willy> another problem: external device resets from sg_io or /dev/sg 11:41 < ggg> de ja vu 11:41 <@willy> not synchronised in the midlayer with spinlocks any more 11:42 <@willy> jejb says there's confusion between device reset and lun reset 11:45 <@willy> discussion whether we want to have a lun reset phase before we do target reset 11:48 <@willy> EMC and IBM are going to investigate if there's any benefit -- if there are any conditions (other than scsi2 reservations) that a lun reset would cure 11:48 <@willy> discussion on that issue to continue on linux-scsi 11:49 <@willy> discussion segued to task management functions 11:49 <@willy> would be useful for error injection 11:53 <@willy> discussion about moving the timer up from scsi into the block layer 11:53 <@willy> issues around not having a scsi host, so we need some grouping of request queues for it 11:54 <@willy> axboe taking an action item to write all the code 11:54 <@axboe> willy: ahem, the timer code :) 11:54 <@willy> no, we decided you weren't paying attention, so you were going to do everything ;-) 11:54 <@mkp> :) 11:55 <@axboe> :) 11:55 <@willy> a request_group will be embedded in the scsi_host 11:59 <@willy> some fields get moved from the scsi_device to the request_queue 11:59 <@willy> use of this additional functionality remains optional for other block drivers 12:04 -!- kyle [~kyle@cabal.ca] has joined #storage 12:07 <@willy> discussion about keeping statistics about what %age of requests have to be retried 12:08 < ggg> willy: another detail for QoS? 12:08 <@willy> doug gilbert's found that he sees a small number of requests retried on SAS/SATA cables, and it'd be useful for diagnosing perf problems 12:09 < ggg> networking has set a pretty good precedent in this area 12:09 <@mbligh> willy, use blktrace ? 12:09 <@willy> mbligh: hch didn't like that, weren't you listening? ;-) 12:09 <@mbligh> nope. 12:10 <@mbligh> stats is easy though 12:14 <@willy> right, moving on to barriers 12:14 <@willy> Ric Wheeler is moderating for this one 12:17 <@willy> ric thinks that it's all about performance and we should notice the cases when barriers went wrong and notify the higher level that the integrity has been violated 12:18 <@willy> jejb thinks that's implausible 12:18 <@mkp> filesystems will crap all over that 12:18 <@willy> discussion about what filesystems do ... 12:19 <@mkp> it's unrealistic to view the requests from filesystems as a state machine 12:19 <@mkp> it's a linear stream of requests 12:20 <@mkp> just like the block device is a linear stream of sectors 12:20 * mkp proudly wears his SEP hat 12:22 <@willy> suggestions about flushing caches 12:23 <@willy> or configuring drive caches to be write-through instead of write-back 12:23 <@willy> interactions between tagging and barriers 12:25 < ggg> HP/IBM/Oracle require write-through AFIAK - ie no WCE is allowed 12:26 <@mkp> I specifically turn it off on my machines 12:27 <@mkp> correctness is more important to me than Windows benchmarketing 12:27 < ggg> if a device implements battery backed RAM and can guarantee the content in it's cache _before_ the data hits media, then they can lie and claim WCE is off even if they imeplement write-back. 12:28 <@willy> one implementation is to wait for all the queues to drain, flush the cache, then you know the barrier's happened 12:29 < ggg> willy: yes, and the advantage is you don't have to trust the device's implementation of barriers/queue tags 12:29 < ggg> drawback is of course performance - it's going to suck if many barriers are used. 12:30 <@mkp> don't forget to bring up the write ordering issue with write-enabled multipath 12:30 < jejb> we're already mentioned that ... that's what the queue draining implmenetation is for 12:30 <@willy> oh yes, that got discussed 12:30 <@mkp> ok 12:36 <@willy> right, we're breaking for lunch now 12:36 <@willy> back in an hour, i guess 12:36 <@mkp> okie 12:44 -!- T-Bone [varenet@trust.slashdirt.org] has joined #storage 12:45 -!- alexisb [~ahbruem@128.189.237.121] has left #storage [] 12:47 -!- jejb [~jejb@128.189.238.8] has quit [Quit: Leaving] 12:56 -!- ewilliam [~ewilliam@vsat-148-63-28-138.c001.g4.mrt.starband.net] has quit [Quit: Leaving] 12:56 -!- willy [willy@128.189.239.68] has quit [Ping timeout: 480 seconds] 12:58 -!- mnc [~mnc@128.189.238.155] has quit [Ping timeout: 480 seconds] 13:41 -!- willy [willy@128.189.239.68] has joined #storage 13:43 <@willy> ok, about half the people are back ... it's a bit too early to restart 13:44 <@mkp> well, in the meantime I've had fun looking at fault injection using a fibre channel target 13:44 <@rdd> should i copy-paste the entire channel window/messages ? 13:44 <@willy> to a wiki? 13:45 <@rdd> or for the mailing list? 13:45 -!- mnc [~mnc@128.189.238.155] has joined #storage 13:45 <@rdd> maybe there's a wiki page at osdl that we could use 13:45 <@rdd> markh: can you check? 13:45 <@willy> I'd like to see it go to a wiki, then give everyone a few days to edit it, then publish the result to LWN or something 13:45 <@rdd> yeah 13:46 -!- jejb [~jejb@128.189.238.8] has joined #storage 13:46 <@willy> the US is on holiday on Monday, so everyone should have plenty of time ;-) 13:46 <@mkp> talked about block guard yet? 13:47 < jejb> yes ... basically it's interesting, but we need to see an implementation 13:47 <@willy> it got brought up yesterday 13:47 <@jgarzik> notes posted somewhere yet? 13:47 <@willy> jgarzik: from yesterday? no. from today? what you saw in this channel earlier. 13:48 <@jgarzik> bummer 13:48 <@willy> i'll dump my notes from yesterday into a wiki as soon as someone provides one ;-) 13:48 <@jgarzik> you know 13:48 <@jgarzik> I've been thinking about an ATA wiki 13:49 <@jgarzik> maybe that should be a storage wiki instead 13:49 <@willy> bingo 13:49 * mkp volunteers jgarzik 13:49 <@willy> mkp: bah, rdd already volunteered markh to get one out of osdl ;-) 13:49 <@mkp> ok, supah 13:49 <@jgarzik> either or. I'm willing to do it too. 13:49 <@willy> good, good. 13:50 <@rdd> i don't mind someone else :) 13:50 <@willy> we were saying earlier it probably makes sense to keep using this irc channel after the summit is concluded 13:51 <@jgarzik> I would prefer #storage on linuxnet 13:51 < markh> Sorry, I was away for a while. we have a drivers wiki at http://developer.osdl.org/dev/opendrivers/wiki/index.php/Main_Page 13:52 <@rdd> markh: should we use that? oga is working on another one now... 13:53 <@willy> jgarzik: I don't mind that, though it's sometimes politically easier to use oftc. For example, I think #kernel would be ... unhappy to see Luben. 13:53 < markh> rdd, OK, I guess that leann is making a new one. 13:54 <@rdd> andre could /kick him :) 13:54 <@jgarzik> point 13:54 <@jgarzik> but I fscking hate the explosion of IRC servers 13:54 <@willy> Yes, that's an irritation, certainly. 13:54 <@mkp> yeah, I need a widescreen to fit all these tabs... 13:54 <@rdd> there should only be 1. 13:55 <@ggg> jgarzik: ta :) 13:56 <@ggg> willy: you saw the "Storage" page on the OSDL wiki? 13:56 <@willy> ggg: no 13:56 <@willy> url? 13:56 <@ggg> http://developer.osdl.org/dev/opendrivers/wiki/index.php/Storage 13:56 * jgarzik would have done storage.yyz.us ;-) 13:57 <@ggg> jgarzik: I would have preferred that personally :) 13:57 * ggg isn't inclined to complain too much about a free service though 13:57 <@willy> ok, the latecomers just waddled in 13:58 <@willy> Mike Anderson leading Netlink + Request/Response sysfs + ioctl interfaces 13:58 <@willy> he has a slide, but due to laptop display issues won't be displaying it 13:59 <@jgarzik> I could always do storage.yyz.us as a redirect 13:59 <@jgarzik> but overall I hate long URLs 13:59 <@willy> attribute interface 14:00 <@willy> it's inappropriate for transactional operations; asynchronous and read-only attributes are fine 14:01 <@willy> to be discussed with greg tomorrow by those going to freedomhec 14:01 <@willy> and then discussed further at KS 14:01 <@willy> some people already using netlink for scsi stuff 14:02 <@jgarzik> some discussion of the 'bad sector' problem. SCSI drives return 'this is the bad sector' 14:02 <@jgarzik> IDE drives say 'something went wrong' 14:02 <@jgarzik> s/drives/drivers/ 14:03 <@jgarzik> In this specific case, ATA should give you the same info as SCSI 14:04 -!- alexisb [~ahbruem@128.189.237.121] has joined #storage 14:04 <@willy> there's talk of doing an sg version 4 for transactions with request/response rather than trying to do something with sysfs 14:04 <@jgarzik> ric thinks that it's all about performance and we should notice the cases when barriers went wrong 14:04 <@jgarzik> IMO if barriers go wrong, that's a kernel bug 14:04 <@willy> no, returning QUEUE FULL from the drive is the barrier going wrong 14:05 <@jgarzik> define "going wrong" 14:05 <@willy> it's no longer functioning as a barrier 14:06 <@willy> hch talking about removal of scsi_request 14:06 <@willy> this is part of the bidi command implementation 14:07 -!- tj [~tj@128.189.235.161] has joined #storage 14:07 <@jgarzik> this sounds like I'm being led into a verbal trap, but I'll bite... 14:07 <@jgarzik> when is a barrier no longer a barrier? 14:08 <@willy> there were three cases discussed; one was QUEUE FULL, one was DM multipath (in which case you use the drain-the-queues implementation above), and i forget the third 14:09 <@willy> might have been drives with write-through caches, but i'm not sure 14:10 <@jgarzik> how does queue-full affect things? or WT (rather than WB) caches? The kernel still controls the ordering of requests, and higher layers waiting on barriers continue to wait, as they should. 14:11 <@jejb> queue full coming from the device allows commands to overtake on the transport 14:11 <@willy> i'm not going to debate this with you now; i have to pay attention and report what's currently being said 14:11 <@jgarzik> fair enough 14:11 <@jgarzik> not trying to debate, just trying to understand 14:15 <@willy> jejb saying we want an 'ata' ULD when libata is moved to block 14:15 <@willy> we can still use sr for ATAPI 14:15 <@jgarzik> ULD == transport driver? 14:15 <@axboe> jgarzik: upper level driver 14:15 <@jgarzik> we definitely want to continue to use sr 14:16 <@willy> ULD being sd, sr, sg, asst, etc 14:16 <@willy> osst 14:16 <@willy> so ata being the replacement for sd 14:16 <@jgarzik> yes, agreed 14:16 <@jgarzik> I wasn't sure whether 'ata' meant transport class or disk device class 14:16 <@willy> understood 14:17 <@jgarzik> is tejun presenting right now? this is all in his presentation 14:17 <@willy> tejun handed out his paper yesterday and some of us have read it 14:17 -!- andmike [~andmike@128.189.232.54] has joined #storage 14:18 <@willy> discussion of how to bind ULDs to appropriate drivers. possibly using classes, or maybe using scsi_bus 14:18 <@willy> more fodder for a greg discussion 14:18 <@willy> because it depends how he's going to try to evolve the driver model 14:18 <@jgarzik> I had hoped transport classes 14:18 <@jgarzik> I advise ignoring the driver model as much as feasible 14:19 <@jgarzik> poor gregkh has become a "review it for proper formatting rather than content" kernel hacker lately 14:20 <@andmike> The transports rely on the driver model today. 14:20 <@willy> we've moved on to needing queueing control for port multipliers 14:20 <@jgarzik> attempting strict class inheritance and objection orientation in C is a path to pain 14:24 <@willy> discussion of having a SATA drive connected to a SAS card ... who does error handling? 14:25 <@willy> jejb is getting corrected from all sides because he thought it had to be done through an expander 14:25 <@jgarzik> there is device error handling, then there is transport/controller error handling 14:26 <@jgarzik> ATA is a bit drastic. Any thrown error often halts all operation, leaving the kernel to pick up the pieces and diagnose 14:26 <@jgarzik> either all operations on that port, or all operations on the controller, depending on the error 14:27 <@willy> the suggestion here was to have the SAS controller invoke the libata error handler 14:27 <@willy> for the drive, obviously. for transport, it'll know how to handle it together 14:27 <@jgarzik> controller must invoke either controller, transport, or device error handler 14:27 <@jgarzik> i.e. SATA link management comes heavily into play 14:27 <@jgarzik> libata error handler won't be able to "drive" as much as it does 14:29 <@willy> doug's making the point that some implementations (ie fusion) have the error handling done in their firmware in their SAT layer, so some drivers won't need it 14:29 <@mkp> it's fibre channel all over again 14:30 <@willy> only with more clueless vendors 14:30 <@mkp> except completely different - gotta keep people busy 14:30 <@axboe> willy: don't the hp ciss do the same? they drive sata and sas on the same cable too 14:30 <@willy> jejb's suggesting a libsatl 14:30 <@jgarzik> libsatl == libata-scsi with a few more lines of code 14:30 <@willy> axboe: probably ... 14:30 <@hch> axboe: cciss is rally nasty 14:30 <@jgarzik> smart firmwares will _inevitably_ do some error handling 14:30 <@hch> axboe: it's a very special case :) 14:30 * mkp doesn't like cciss 14:30 <@jgarzik> can't do anything about that 14:30 <@axboe> hch: it is 14:32 <@willy> hch is suggesting writing separate error handlers for SATA-on-SAS and SATA-on-SATA and then see how much commonality we can find 14:33 <@jgarzik> For aic94xx and Broadcom, there is TONS of commonality. You have to manage the SATA link exactly the same way. 14:35 <@jgarzik> No idea about ipr, but I've already poked at this for aic94xx, and have seen what is needed for my Broadcom SAS+SATA cards. Tejun's link management code should work, with tweaks. 14:36 -!- jsmart [~jsmart@128.189.236.52] has joined #storage 14:37 <@willy> hch's just got voted down 14:37 <@mkp> send him home on the next bus :) 14:39 <@willy> Brian King's pushing attaching SATA drives on SAS controllers as SATA devices 14:41 <@willy> and that seems to be the way we'll go for the moment, even though jejb isn't a fan of that architectural choice 14:41 <@jgarzik> if ipr exports the SATA link, you must treat it as a SATA device 14:41 <@mbligh> I think willy could get a job as an F1 commentator ;-) 14:42 <@jgarzik> if the firmware discombobulates the SAS link on ipr, SCSI may be a bette choice 14:42 <@willy> ipr has a thick firmware layer 14:42 <@willy> but for sata drives on other sas controllers, the ata ULD does end up attaching to the device 14:42 <@jgarzik> side note: we also have RAID and non-RAID controllers faking SCSI, but providing ATA passthrough because the underlying devices are all ATA 14:43 <@willy> relayed. 14:44 <@jgarzik> willy's F1 commentator services are MOST appreciated. 14:44 <@willy> yesterday sucked for everyone with the phone service ... I decided this was the best alternative 14:44 <@mkp> yeah, this is great! 14:45 <@mbligh> willy ... the reality <-> IRC bridge ;-) 14:45 <@axboe> irc expander 14:45 * willy smokes a phat one, yo 14:45 <@mbligh> heh 14:45 <@jgarzik> lol 14:45 <@T-Bone> ;) 14:45 <@jgarzik> willy: pass it, man... 14:46 <@willy> we *are* in vancouver, after all 14:46 <@mbligh> I thought that was mexico nowadays? 14:46 <@mbligh> perhaps we should hold the next conference down there 14:46 <@willy> BC's biggest export is bud 14:46 * mkp read the employee handbook this morning and Oracle explicitly prohibits use of drugs in the workplace 14:46 <@jgarzik> we could all visit Christiania in Denmark :) 14:46 <@rdd> so .mx then :) 14:46 <@mbligh> so you can stand outside the front door and smoke? 14:46 <@ggg> mkp: vancouver is a workplace? 14:46 <@willy> ggg: he works from home 14:46 <@axboe> jgarzik: they almost shut that down, dude :( 14:47 <@jgarzik> axboe: :( 14:47 <@ggg> willy: I know :) I was there last year :) 14:47 <@axboe> they still sell, but no open market place anymore 14:47 * mkp expects HR to call him any minute now about the beer in the fridge 14:47 <@willy> Amsterdam is a pretty plausible place to hold a storage summit next year ... 14:47 <@mkp> axboe: yeah, it's a bummer. Vanessa and I were playing there during the big raid in October 14:47 <@jgarzik> MandrakeSoft used to have a free beer bust every Friday afternoon, in the Paris office. Those were the days... 14:47 <@ggg> willy: re cciss - it is SAS (P600) and exports only the block interface...dunno how mgnt stuff is handled. 14:47 <@axboe> willy: agreed :) 14:48 <@jgarzik> anyway, this isn't the "Bob" summit 14:48 <@axboe> mkp: yeah it really is, all they effectively did was move the stuff to seedy hash clubs 14:48 <@jgarzik> Overall we must deal with cases where kernel handles link management and error recovery, and cases where firmware takes it out of our hands 14:49 <@jgarzik> Can't shove both pegs down the same hole 14:49 * ggg agrees 14:49 <@ggg> jgarzik: that's why I was suggesting seperating transport errors and devices errors in the API 14:51 <@jgarzik> often the separation actually depends on how the error is handled 14:51 <@jgarzik> the LLDD ultimately drives the EH, so it naturally segregates what is needed 14:52 <@jgarzik> a device error may imply we need to kick the transport 14:53 <@ggg> jgarzik: ok..."timeout" is my favorite example...mentioned earlier 14:54 <@willy> right, we've finished with LibATA, moving on to Queue Handling by James Smart 14:54 <@jgarzik> yes, timeout is essentially a "WTF???" error 14:54 <@willy> dealing with high end arrays 14:54 <@willy> queue full condition designed for single lun, single initiator 14:54 <@jgarzik> ... arrays from BC and Amsterdam? 14:54 <@willy> with lots of lun's sharing one target's resources, the algorithm falls over pretty quickly 14:55 <@jgarzik> yes libata/block has to deal with this too 14:55 <@willy> it never kicked in with one test case he was trying 14:55 <@jgarzik> sx8 has a host queue, so the bottleneck is there. sometimes the bottleneck is at the target. sometimes its at the LUN. 14:55 <@ggg> willy: or just lots of luns.....SG team has been testing MI on RH/SUSE kernels and couldn't get past 300 or so. 14:56 <@jgarzik> sometimes the queue-full bottleneck is at link (transport) too. 14:56 <@willy> ggg: i don't think that's relevant to this conversation 14:56 * ggg notes this was from late last year.... 14:56 <@jgarzik> non-queued master/slave ATA 14:56 <@ggg> willy: ok 14:56 <@willy> it can stay too high and too low, depending what you're doing 14:56 <@willy> they were hacking around it in prep_queue 14:57 <@jgarzik> Ideally, I would love to see some sort of way to export _where_ the TCQ bottleneck is {host, link, target, or LUN} 14:57 <@jgarzik> then let infrastructure handle it from there 14:57 -!- patman [~patman@216-99-218-173.dsl.aracnet.com] has joined #storage 14:58 <@willy> under high load, the rampdown took a minute and a half to abate 14:59 <@willy> jejb is suggesting making queue full algorithms pluggable 14:59 <@willy> so devices can select the one they want, based on inquiry data 14:59 <@willy> or the user can override it 15:00 <@jgarzik> ultimately there is a disconnect between request_queue and "TCQ domain" 15:01 <@jgarzik> request_queue has bits for cross-queue tagging, but that's it 15:01 <@willy> Doug points out the limit could even be in the sas expander or FC switch 15:01 <@jgarzik> nod nod 15:01 <@jgarzik> or SATA port multiplier 15:02 <@willy> and Mike Anderson points out that a multi-initiator device could be being shared with a Windows or Solaris host that's sucking up all the resources temporarily 15:02 <@willy> so part of the earlier discussion was to introduce the request_group concept in the block layer 15:03 <@ggg> willy: as noted before, that's a sysadmin issue - they have to config hosts that are sharing to be "reasonable" 15:03 <@jgarzik> Another complex, real-world example: SATA controller sata_sil24 and ahci have $N "command slots." These slots are available to all devices -- including downstream expander devices -- on a single SATA port. 15:03 <@willy> anyway, the pluggable queuefull algorithm idea met with general nodding of heads and scratching of beards, so we're moving on 15:03 <@jgarzik> so regardless of device NCQ limitations, you have a max of $N commands 15:03 <@willy> Doug Gilbert's talking about tools 15:03 <@willy> http://www.torque.net/sg/tools.html 15:03 <@willy> smartmontools 15:04 <@rdd> where is blktool ? 15:04 <@jgarzik> mention that jgarzik wants to expand and fix blktool! 15:04 <@willy> ok, when doug's finished talking about smartmontools 15:04 <@jgarzik> blktool == ethtool counterpart, designed to be a single tool that is knowledgeable about specific hardware bits 15:04 <@willy> supports 9 OSes 15:04 <@willy> supports SATA and SAS passthroughs 15:05 <@willy> and things like 3ware 15:05 <@mkp> what about SAF-TE and SES? 15:05 <@mkp> vs. udev 15:05 <@jgarzik> blktool is intended to help eliminate $N^2 hardware-specific utils 15:05 <@willy> not by this tool 15:06 <@willy> SES in sg3_utils, SAF-TE in safte-monitor 15:06 <@mkp> yeah, but needs lovin' 15:06 <@jgarzik> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/blktool.git 15:06 <@mkp> also unclear where it fits in given that it monitors fans and enclosures and not just hotswap 15:07 <@willy> doug's commenting that smartmontools doesn't currently work on fusion SATA because it has its own SATL 15:07 <@willy> but a new rev will fix that 15:08 <@jgarzik> big whoop. fusion is just another example of what I mentioned previously -- smart firmware that exports ATA passthrough functionality 15:08 <@willy> now he's talking about porting problems, and he doesn't see why we have to care about having multiple sg interfaces, since supporting more than one is analagous to porting to a new OS 15:09 <@jgarzik> easier solution: make sure the LLDD supports ATA_12 and ATA_16 standard SCSI commands 15:09 <@jgarzik> converting in-driver to vendor-specific ATA passthrough command if necessary 15:10 <@jgarzik> then SG_IO and ATA_{12,16} Just Works(tm) 15:10 <@jgarzik> as does smartmontools with "-d ata" 15:10 <@jgarzik> er, strike that last. smartmontools still uses HDIO_xxx. 15:12 <@ggg> willy: "scsiadd" scripts could be added to the list as well...trivial compared to most of the rest though 15:12 <@willy> ok, scsiadd and blktool to be mentioned. doug's going through the list, talking about each one 15:12 <@willy> i'm not realying any of that ... 15:13 < patman> is there a current irc log online? 15:13 <@willy> i think rdd has one 15:13 <@rdd> yes but not online 15:13 <@rdd> yet 15:13 <@rdd> will be 15:14 <@mkp> eventually... 15:14 <@rdd> ur hired. 15:14 < patman> :) 15:14 <@jgarzik> log so far: http://gtf.org/garzik/misc/irc.txt 15:14 <@jgarzik> just today's stuff, nothing from yesterday :( 15:14 <@mbligh> I have the whole log 15:18 * jgarzik carefully excises some certain non-storage talk from the IRC log on the web :) 15:19 <@mbligh> http://test.kernel.org/storage/irc 15:19 <@mbligh> but it's not live 15:19 <@mbligh> and willy's bridge function was not yet fully fledged yesterday 15:19 <@rdd> it was too, fully fledged fsckup 15:20 <@rdd> (not his fault) 15:20 <@jgarzik> but it's fun to blame him 15:21 -!- alexisb [~ahbruem@128.189.237.121] has left #storage [] 15:22 -!- alexisb [~ahbruem@128.189.237.121] has joined #storage 15:23 <@willy> ggg: has source to scu been released? 15:23 <@mkp> last I heard OSRB had shot it down 15:24 <@ggg> willy: not that I'm aware of...let me look again. 15:29 <@jgarzik> dougg still talking about tools? 15:29 <@jejb> we're just wrapping it up ... axboe is asking a question about scsi_debug 15:30 <@willy> ok, moving onto SAS/SATA by Brian King 15:31 <@jgarzik> ipr's hooks into libata will likely look vastly different from the aic94xx/Broadcom SAS+SATA support 15:32 <@willy> he's looking at an api like port_alloc/port_destroy 15:32 <@willy> jejb's asking if he wants to do something transport-classy at this point 15:32 <@willy> brian's not done anything about it yet 15:32 <@jgarzik> that is probably more complex than its worth 15:32 <@jgarzik> libata exports a very high level "issue, complete ATA command" API that ipr uses in Brian's patches 15:33 <@rdd> is it tea time yet? i need java 15:34 <@willy> probably 10 minutes till tea 15:34 <@willy> depends how much arguing happens 15:34 <@rdd> k 15:34 <@ggg> willy: OSRB data base doesn't show an scu project being submitted 15:34 <@willy> ggg: hmm, perhaps I should. 15:35 <@willy> I have another one to do for *mumble* project 15:35 * ggg nods 15:35 <@jgarzik> heh 15:35 <@willy> excellent, teatime 15:35 <@jgarzik> at one time, Red Hat had a "project mumble" 15:39 <@ggg> willy: searching for "SCSI" in OSRB database gets some interesting hits...I have to wonder where people are thinking...I'm in fact worried that one of the projects was approved... 15:39 <@ggg> s/where/what 15:39 <@willy> ggg: I guess bdale would be your best point of contact for initial worries 15:40 <@ggg> yeah, I should talk with him 15:47 <@patman> jsmart: did you cover naming/namespace/udev stuff? 15:47 -!- markh [~markh@fw.osdl.org] has quit [Quit: Leaving] 15:48 <@willy> patman: we agreed that naming needs to be unified between the distros, and want OSDL to set up a workgroup for it 15:48 <@patman> willy: i like suse's setup with udev (and glibc!) in initrd. 15:48 <@willy> i suggested sending a delegate from each distro, put them in a room with a half-brick in a sock, and whoever's left ... 15:49 <@patman> :) 15:49 <@patman> having glibc also lets us use the iscsi stuff as is, where for other distros, they had to create/use the iscsistart program ... ask mikec 15:49 <@patman> i don't know how you can a unified name space without same /dev code in initrd 15:50 <@jgarzik> /dev naming? 15:50 <@jgarzik> ultimately that's policy, like net device naming 15:50 <@jgarzik> and One True Policy does not fit all 15:51 <@patman> yes, and people want to be able to use the same policy across distros 15:51 <@patman> "/dev/disk" naming 15:51 <@patman> [i guess] 15:51 <@patman> but if you don't have udev in initrd, your naming policy can't be the same there as it is during "normal"usage 15:52 <@jgarzik> A standard is fine, as long as there is no code that _requires_ the standard be followed 15:52 <@patman> yes, and i think we are there with udev + distro rules. 15:53 <@jgarzik> yes 15:53 <@patman> i thought jsmart post to summit list implied he had issues with early names (in initrd) across distros 15:53 <@axboe> jgarzik: group photo done 15:53 <@jgarzik> Just want to make sure that's not broken by hardcoding. Garzik's Nutty Embedded Distro should be allowed to name block devices /dev/ego0, /dev/ego1, ... 15:54 <@jgarzik> axboe: rock :) 15:54 <@rdd> /dev/rock ? 15:54 <@patman> if users can add their own rules on top of those, then fine 15:54 <@patman> /dev/vancouver-rocks 15:54 <@jgarzik> s/on top of/can replace/ 15:55 <@patman> i mean you can put whatever in /dev/mine/ and not care about changing /dev names, but probably does not matter 15:56 <@patman> i mean: replace gives more functionallity than "on top of" 15:56 <@jgarzik> nod 15:56 <@willy> it's something we hear customer complaints about 15:56 <@mkp> just stop using device names, dammit 15:56 <@willy> it makes it hard for customers to migrate between RH and SuSE ... or to used a mixed infrastructure 15:56 <@rdd> mkp ;) 15:56 * willy whips out his sonic screwdriver and flips a few bits on the drives 15:57 <@mkp> it's so retarded in this day and age 15:57 <@jgarzik> some naming (or "labelling") is inevitable 15:57 <@willy> mkp: you mean "stop using sda et al", then? 15:57 <@mkp> some wise guy once said: address data by content, address by path for discovery and recovery 15:57 <@patman> labeling sure is nice, until you clone a drive 15:58 <@mkp> patman: and mirror, and multipathing 15:58 <@patman> mkp: yes yes 15:58 <@mkp> you need both content + path information to assemble your view of the world 15:58 <@mkp> relying exclusively on one or the other won't work 15:58 <@willy> i just need better drugs to assemble my view of the world 15:59 <@jgarzik> by labelling, I don't mean LABEL= in /etc/fstab. jejb and Luben have both talked about associating multiple labels with each SCSI device, i.e. "$h:$c:$i:$l" and "$sas_address" might both point to the same device 15:59 <@patman> willy: heh ... 15:59 <@jgarzik> i.e. there's been talk of converting h/c/i/l integers into strings 15:59 <@jgarzik> I did some work on that 15:59 <@willy> jgarzik: I was talking to jejb about that the other evening 15:59 <@jgarzik> (killing HCIL) 15:59 <@patman> i would like h/c/i/l to just become some random number :) 16:00 <@patman> or just use *scsi_device 16:00 <@willy> but he can't see the advantage of changing the hcil numbers 16:00 <@jgarzik> patman: internally, pointers are used 16:00 <@willy> it's replacing one more-or-less arbitrary number with another 16:00 <@jgarzik> but HCIL is exported to userspace 16:00 <@jgarzik> and is valid for older hardware 16:00 <@jgarzik> back compat 16:00 <@willy> and is faked for newer hardware 16:00 <@jgarzik> nod 16:00 <@patman> jgarzik: what happened to your 8 byte lun work ... talking about h:c:i:l becoming unreadable 16:00 <@jgarzik> thus a HCIL label need not be required... but capable 16:01 <@jgarzik> patman: 8-byte lun work still sitting in my git repo somewhere 16:01 <@willy> if the I is just an arbitrary small integer which doesn't change until next boot, is that any worse than making I an arbitrary string? 16:01 <@patman> jgarzik: :-( 16:01 <@jgarzik> I should do another bombing run 16:01 <@jgarzik> patman: will take several bombing runs to fully merge... 16:01 <@mkp> http://marc.theaimsgroup.com/?l=linux-scsi&m=101840990116069&w=2 16:02 <@jgarzik> willy: no, but I argue that all of h/c/i/l should be a single string 16:02 <@patman> i think LUN should become an attribute, then at least kill that part of it (after transition period to allow user hcnages) 16:02 <@jgarzik> not just 'i' 16:02 <@willy> it is ... it's the scsi bus_id 16:02 <@willy> %d:%d:%d:%d 16:03 <@patman> willy: it is parsed as h:c:i:l, at *least* to get the LUN, as that is the only place to find it 16:03 * mkp steps down from his soap box 16:03 <@patman> see ... AFAIR SuSE mkinitrd 16:04 <@willy> ok, we're starting the bug discussion 16:04 <@patman> ok i will shut up :) 16:04 <@jgarzik> mkp: 4. by media serial number 16:04 <@mkp> jgarzik: yeah, aeb added that in a followup to my mail 16:06 <@willy> discussion about sending bugzilla mail to linux-scsi 16:06 <@willy> hch being diplomatic 16:06 <@willy> (bugzilla vs debian bts) 16:06 <@willy> "it's a complete piece of shit" 16:06 <@willy> I think he may hate bugzilla more than I hate sysfs ... 16:06 <@rdd> hehe 16:06 <@mkp> linux-scsi-bugs, maybe? 16:06 <@jejb> that is linux-scsi 16:07 <@willy> the consensus was to send the new bug to linux-scsi, but no subsequent mails 16:07 <@mkp> Well, traffic isn't overwhelming right now. In general I find it annoying to mix request tracking and general discussion 16:08 <@jgarzik> mkp: the subject lines are clearly prefixed... 16:08 <@willy> mkp: the status updates won't get sent, only the very first mail 16:08 <@mkp> fine 16:08 <@jgarzik> and I doubt many developers would subscribe to linux-scsi-bugs, honestly 16:08 <@rdd> right 16:11 <@willy> discussion of whether it would be useful for OSDL to host a pile of hardware 16:11 <@willy> consensus: no, there's just too many possibilities to test 16:11 <@jgarzik> NDAs are a problem, with OSDL 16:11 <@jgarzik> but maybe that's only for developers, not hardware 16:12 <@jgarzik> also, it seems like OSUOSL is picking up some hosting duties 16:12 <@willy> Ric is suggesting ways of sharing interoperability testing results 16:12 <@jgarzik> funding for attending connect-a-thons would be nice 16:12 <@willy> jgarzik: it's still totally infeasible to host enough combinations of hardware to be useful 16:12 <@willy> i'll raise that, i was thinking that too 16:12 <@jgarzik> I wish I could attend the SATA connect-a-thons 16:13 <@rdd> definitely 16:13 <@jgarzik> willy: I've often thought that a distributed set of volunteers would be useful. Give non-hackers a chance to be helpful on first-run code. 16:13 <@willy> doug gilbert was invitesd to a SAS plugfest 16:13 <@jgarzik> Legion of Testing Justice 16:13 <@willy> For Great Power! 16:13 <@rdd> i did a usb plugfest in 1999 :) 16:15 <@willy> error injection being discussed ... target implementations in the drivers 16:15 <@mkp> willy: I looked at scst earlier today and will try messing with it later 16:15 <@willy> excellent 16:16 <@mkp> willy: looks pretty easy to implement a fault injection harness 16:16 <@willy> w00t 16:16 <@mkp> for both transport and data 16:16 <@jejb> mkp, there's a scsi-target-2.6 git tree on www.kernel.org/git 16:16 <@mkp> jejb: scst or a different implementation? 16:16 <@jejb> it's based on scst, but unified with mike christie's implmentation 16:16 <@mkp> excellent 16:16 <@jejb> it's much more user level based 16:17 <@jejb> as in you can write the target largely in user space 16:17 <@mkp> awesome 16:17 * jejb heads of fto his meeting 16:17 -!- jejb [~jejb@128.189.238.8] has quit [Quit: Leaving] 16:18 <@mnc> mkp: the userspace part is here http://svn.berlios.de/svnroot/repos/stgt/branches/use-scsi-ml/ 16:18 <@mkp> mnc: gracias! 16:18 <@mnc> mkp: I do not have access to some of my links so you may want to email tomo fujita.tomonori@lab.ntt.co.jp for the updated userpsace tree 16:19 * mkp takes notes 16:19 <@mkp> which HBAs have target support? 16:20 <@mnc> none, right now. There is a vscsi target (IBM power srp target), and a software iscsi one, and I have been working on a qla2xxx one 16:20 <@mnc> scst has a lot more hw supported 16:20 <@mkp> okie 16:20 <@mkp> at least one working SPI target would be nice 16:21 <@mnc> I think they have a aic or mpt one. 16:21 <@mnc> they == scst 16:21 <@mkp> *nod* 16:21 <@mkp> ok 16:23 <@jgarzik> I've talked to a couple drive vendors, and they were interested in supporting fault injection in their firmware (though perhaps a custom firmware for developers only) 16:23 <@jgarzik> would be interesting for individual developers to explore, given that the vendors are open to it 16:23 <@rdd> does anyone have areca hardware? 16:27 <@willy> they sent some to hch and jejb, iirc 16:27 <@jgarzik> I bet the vendor would be willing to ship out a piece or two 16:28 <@jgarzik> and it seems like areca is pretty close to merge-ready 16:29 <@rdd> i expect it needs lots of real testing 16:29 <@rdd> although eric has done some 16:29 <@jgarzik> I see occasional "it works, when is it going upstream?" posts from users testing -mm 16:30 <@axboe> the whole max sectors situation wasn't a confidence builder in areca, though 16:30 <@rdd> yep 16:31 <@jgarzik> tj: BTW, we could probably start looking at pushing your sata_sil better-m15w patch into #ALL, and eventually #upstream 16:31 <@tj> jgarzik: sure about that? that's pretty nasty stuff 16:32 <@tj> jgarzik: but then again, it's useful & not too difficult to maintain 16:32 <@jgarzik> tj: I don't recall the level of nastiness, but it seems like libata API was better about multiple commands now 16:32 <@jgarzik> tj: is it driven via libata-scsi? 16:32 <@tj> jgarzik: when I get back to home, i'll put it under more tests and submit it for inclusion 16:33 <@tj> jgarzik: no it's completely contained inside sata_sil 16:33 <@tj> jgarzik: no contamination outside of sata_sil 16:33 <@jgarzik> hmmmm 16:34 <@jgarzik> well, I suppose that's best for when we make libata-scsi optional 16:34 <@tj> yeap, also if all LLDs move over new EH, we can actually kill the command recycle path all together 16:34 <@jgarzik> at present it would be cleaner, but less modular, to implement inside libata-scsi by breaking up the SCSI command into multiple ATA commands 16:34 <@jgarzik> but then we would have to move the code back once libata-scsi becomes optional 16:35 <@tj> hmmm... as the only multiple command user is ATAPI request sense which is gonna be removed, I don't wanna add another user of that 16:35 <@jgarzik> tj: hopefully all LLDDs get moved over to new EH sooner rather than later :) 16:35 <@tj> heh heh, i hope so too. but not so sure if that's gonna happen :) 16:36 <@jgarzik> tj: for the class of controllers similar to sata_{uli,sis,vsc,via,etc.} it should be trivial 16:36 <@jgarzik> tj: [tangent] another project is to make it easy for SATA controllers to register PATA ports (read: ports with vastly different attributes and port_operations) 16:40 -!- alexisb [~ahbruem@128.189.237.121] has left #storage [] 16:44 <@rdd> anything happening? 16:48 <@patman> no :) 16:48 <@patman> they must be about done 16:49 <@andmike> Willy is talking about maintainer attributes. 16:50 <@mkp> "Must be wearing a bowtie..." 16:50 <@patman> and jejb is not here 16:50 <@jgarzik> are these attributes integrated into the sysfs object model? 16:50 <@rdd> was it explained why that is on the agenda? 16:50 <@andmike> Already on the list. 16:50 <@andmike> Just future preparation. 16:50 <@patman> prep for what? 16:51 <@ggg> how to tie a proper bow :) 16:51 <@jgarzik> speaking in generalities, its good to have a plan if the maintainer gets hit by a bus (so says Linus) 16:51 <@rdd> surely 16:52 <@patman> okay, i thought there were maybe other implications (besides a bus) 16:52 <@rdd> so is my pre-tied bowtie a problem? 16:52 <@andmike> No, not that was mentioned. 16:52 <@patman> ? 16:53 <@rdd> i think that was your answer, not mine 16:53 <@patman> rdd: yeh :) i just have to read it once more "that was not mentioned" 16:54 <@jgarzik> patman: jejb was poked about that, when it first appeared on the schedule. He was evasive :) 16:54 <@jgarzik> (on #kernel) 16:54 <@rdd> and on the mailing list iirc 16:54 <@andmike> patman: I was just indicating that James did not explicitly say that this was for a fixed future date of changing maintainer. 16:54 <@ggg> jejb likes to provide a bit of suspense :) 16:54 <@rdd> so it's just a filler while he is on a conf. calll.... :) 16:55 <@patman> yeh ... i'm sure he had a big grin when he talked about it 16:55 <@rdd> does anyone want to be the main storage wiki admin ?? 16:56 <@mbligh> I can easily set up a wiki. Being editor-in-cheif is not so attractive ;-) 16:56 <@rdd> oh well, i just got one setup at osdl 16:57 <@mbligh> OK, is fine. Not sure it matters where it is 16:57 <@rdd> yes 16:57 <@patman> mbligh: that is what is so nice about wiki's ... community editing (well in theory) 16:57 <@mbligh> yes. but they still need an editor to create structure, etc. 16:57 <@rdd> yes, but keeping trashers out 16:57 <@jgarzik> rdd: a useful hostname like storage.dev.osdl.org would be nice 16:58 <@mbligh> OK, flood warning .... this was the bugs /testing notes: 16:58 <@mbligh> Bugs 16:58 <@mbligh> Nobody is looking at bugs unless akpm forwards 16:58 <@mbligh> Forward bugs notification (bugme-new) to appropriate lists: 16:58 <@mbligh> linux-scsi 16:58 <@mbligh> linux-ide 16:58 <@mbligh> linux-raid 16:58 <@mbligh> ----------------------------------------- 16:58 <@mbligh> Testing 16:58 <@mbligh> connectathon / University of new hampshire / SAS plugfest 16:58 <@mbligh> error injection 16:58 <@mbligh> mid layer vs. drivers 16:58 <@mbligh> As much in software as possible 16:58 <@mbligh> use target-mode SCSI drivers with multiple hosts per bus 16:58 <@mbligh> (to provide fake objects) 16:58 <@mbligh> inject by talking to f/c switches 16:58 <@mbligh> module load / unload 16:58 <@mbligh> should do more upstream testing instead of vendor kernels. 16:58 <@rdd> @jgarzik we have linux-storage.osdl.org 16:58 <@mbligh> Tests 16:58 <@mbligh> disktest, database 16:58 <@mbligh> sg-based unit tests (need writing) 16:58 <@mbligh> fill up filesystems 16:58 <@mbligh> > 16 simul sg IOs to 1 fd for stress testing 16:59 <@jgarzik> rdd: that works 16:59 <@patman> rdd: yeh it is hard to keep out trashers and yet encourage participation 16:59 <@patman> [in some cases ...] 16:59 <@jgarzik> linux-net wiki seems to have survived 17:00 <@jgarzik> without trashers 17:00 <@jgarzik> needs admin approval to link a new page, but that's it 17:01 <@mbligh> I've found structure much more of a problem than trashers, personally 17:02 <@mbligh> formatting the shit into some sensible hierarchy is not trivial 17:02 <@jgarzik> That's one thing I absolutely hate about wiki -- flat namespace 17:02 <@jgarzik> even if the navigation manages to grow sanely 17:02 <@jgarzik> websites and URLs are naturally tree-based, and wiki subverts that 17:05 <@mbligh> you can do trees in wiki 17:06 <@mbligh> http://mbligh.org/linuxdocs/ 17:06 <@mbligh> that's totally tree-style 17:06 <@mbligh> some wikis are broken I spose. that is back by moinmoin 17:06 <@jgarzik> mbligh: nice 17:07 <@jgarzik> yeah, mediawiki is very popular, and has a flat namespace 17:07 <@mbligh> rdd, what backend is the one at OSDL using? 17:07 <@rdd> oops, mediawiki 17:07 <@mbligh> If people want, I can put another moinmoin instance on test.kernel.org easily 17:09 <@patman> i know someone mentioned keeping this irc site, but maybe use linux-storage instead (for kernel + user + any storage discussion) 17:09 <@patman> #linux-storage 17:11 <@willy> did somebody write down the list I wrote on the board? 17:12 <@rdd> testing... tests.... 17:12 <@rdd> like that? 17:12 <@rdd> mbligh did that 17:12 <@willy> hch is talking about bidi implementation issuers 17:17 <@patman> tj: did you have some slides? 17:17 <@patman> presentation? 17:18 <@tj> patman: I have some pdf 17:18 <@tj> patman: http://htj.dyndns.org/ftp/incoming/sata-blk.pdf 17:18 <@willy> we're being kicked out 17:19 <@mbligh> willy / rdd - nope, only the testing stuff. not the maintainer bits 17:19 <@patman> hey thanks everyone for posting to irc etc. 17:19 <@patman> too bad phone thing did not work out 17:19 <@patman> wish i could have been there too :( 17:19 <@willy> i agree ... exhausting! 17:19 -!- mnc [~mnc@128.189.238.155] has quit [Remote host closed the connection] 17:20 -!- tj [~tj@128.189.235.161] has quit [Quit: bye] 17:20 <@willy> thanks for being interested 17:20 <@andmike> Flood warning on scsi maintainer discussion. 17:21 <@andmike> Interactions with related subsystems. 17:21 <@andmike> Wiki 17:21 <@andmike> Design issue discussion w/o patch 17:21 <@andmike> Quick to merge (when right). 17:21 <@andmike> Think Skin 17:21 <@andmike> Vision 17:21 <@andmike> Plays well with others. 17:21 <@andmike> Community credentials 17:21 <@andmike> Moral Caliber 17:21 <@andmike> Confidence of Linus 17:21 <@andmike> Conference abilities 17:21 <@andmike> Knowledge of standards (T10, T11, T*) 17:21 <@andmike> Bow Tie :-). 17:21 <@andmike> sd / sr maintainers (volunteers would be needed). 17:21 -!- andmike [~andmike@128.189.232.54] has left #storage [] 17:23 -!- patman [~patman@216-99-218-173.dsl.aracnet.com] has quit [Quit: Leaving] 17:30 -!- willy [willy@128.189.239.68] has quit [Ping timeout: 480 seconds] 17:32 -!- jsmart [~jsmart@128.189.236.52] has quit [Ping timeout: 480 seconds] 17:33 <@jgarzik> there is #storage on linuxnet ###